When hearing the term beacon, people might think of flares or a lighthouse at first. It can certainly be a guide in the ocean of different authority files on the web. In this post, I will take a closer look at BEACON files, their implementation and why they are a useful addition to discovery systems like the Specialised Information Service Performing Arts (FID DK).

Introduction to BEACON

Authority data disambiguates and represents controlled entities like persons, corporate bodies, places, topics, works and events via unique identifiers. It allows for better accessibility and consistency of information while making cataloging more maintainable. Even though authority data like the German Integrated Authority File (GND) is widely used today, it is still a challenge to bundle information and resources about an entity that can be found in discovery systems, bibliographies and data collections on the web. Introduced by Jakob Voß and Mathias Schindler in an early version on Wikipedia in 2010, BEACON serves as a data interchange format in order to interlink websites that use such authority data. BEACON files contain concordances of URLs of an authority file like GND and URLs of e.g. a discovery system so that further resources about an entity can be found via direct links.

The Virtual International Authority File (VIAF) uses a similar approach in offering corresponding links to authority files of national libraries for a certain entity. However, a BEACON file does not merge all instances of a single entity into a single virtual authority file. Instead it offers a set of links to corresponding authority records for one website in form of a link dump. So owners of discovery systems or data collections on the web can inform users or interested parties which authority records can be found on their website and how to access them.

Use cases

While BEACON is mostly used to exchange links to GND authority records for persons, it can also be used for other entity types such as corporate bodies or places. The format itself is also not restricted to the use with GND. It could easily be applied with other authority files like ORCID or GeoNames.

As an owner of a discovery system you can generate BEACON files from your data or even offer a link resolver that forwards to the corresponding data entry. Data aggregators can also use BEACON files to aggregate information about an entity on a dedicated website. You can furthermore use BEACON files by other providers yourself to offer see-also-services. This improves visibility by enabling cooperation with other institutions and larger databases that offer links to your own discovery system.

In the FID DK search portal, we aggregate metadata from the performing arts domain, including relevant libraries, archives and museums. In addition to metadata about resources like audiovisual material, costume drafts or playbills, we offer fact sheets about persons (Fig. 1), corporate bodies, events and works. This information is – in case of GND identifiers – displayed via lobid-gnd API. The “Additional Links” section on the right is based on EntityFacts which itself is implemented by using BEACON files.

Louise Dumont (GND 118681192) in the FID DK search portal

(Fig. 1: Fact sheet about Louise Dumont (GND 118681192) in the FID DK search portal that is displayed on-the-fly via lobid API. Related resources and events are given as additional information to the user.)

An extensive list of institutions offering BEACON files for their resources, collections or bibliographies about persons as well as other entity types can be found on Wikipedia (in German). We also found other Specialised Information Services that offer BEACON files for persons in their discovery systems, e.g. Jewish Studies, IxTheo and musiconn.

BEACON format

The file format BEACON is designed to be a simple, line oriented text format that is encoded in UTF-8 and ends with a line break. The following sections describe the two parts of BEACON files – meta fields and links. It is based on the latest version of the BEACON specification found on GitHub.

Set of meta fields

BEACON files start with meta fields in upper-case letters marked with a #-prefix that describe context information about the file and link construction. Mandatory meta fields are

  • #FORMAT: BEACON – Marker for BEACON file
  • #PREFIX: SourceURL – contains in most cases an URL template of the GND (http://d-nb.info/gnd/) or a similar source authority file
  • #TARGET: TargetURL{ID} – for a URL template for the address of a corresponding target page. When replacing the {ID} of the target URL with a target identifier from the data lines has to resolve to a direct link to an entity or a hit list for an entity.

Optional meta fields are context information like the name and contact information of the institution that provides the authority records (#NAME,#CONTACT,#INSTITUTION), a description of the contents (#DESCRIPTION) or temporal information like the date of creation or the update frequency (#TIMESTAMP,#UPDATE). The order of meta fields is not relevant as long as they precede the set of links. You can further define the type of relation between source and target identifier via an RDF-based ontology. The default value is rdfs:seeAlso,

  • #RELATION: http://www.w3.org/2000/01/rdf-schema#seeAlso

but you can also use owl:sameAs or foaf:isPrimaryTopicOf. This is why BEACON can also be used to generate RDF triples like the following:

1
2
3
4
5
@prefix gnd: <http://d-nb.info/gnd/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

gnd:119062887 rdfs:seeAlso <http://performing-arts.eu/agent/gnd_119062887> .
gnd:116820446 rdfs:seeAlso <http://performing-arts.eu/agent/gnd_116820446> .

For a full list of optional meta fields take a look at the BEACON specification or the BEACON format description (in German) on Wikipedia.

The set of links in form of lines of authority identifiers are following below the meta fields. There are three possible ways to state the identifiers depending on the use case. The easiest version is to provide one identifier in each line which means that relevant information on this source identifier is given at the target web site with the same identifier. An example for a BEACON file with only mandatory meta fields and identifier lines would look like this:

1
2
3
4
5
6
#FORMAT: BEACON
#PREFIX: http://d-nb.info/gnd/
#TARGET: http://performing-arts.eu/agent/gnd_{ID}
118681192
116820446
(...)

By means of this BEACON file, further information about the entity at the source URL http://d-nb.info/gnd/118681192 can be found at the target URL http://www.performing-arts.eu/agent/gnd_118681192 which can be seen in Fig. 1.

You can also state the number of hits for relevant material about an entity in a discovery system by adding it after one vertical bar:

1
2
3
118681192|708
116820446|10
(...)

The third option is to provide the address explicitly after two vertical bars:

1
118681192||https://en.wikipedia.org/wiki/Louise_Dumont

or if the URL is given in #TARGET simply:

1
118681192||Louise_Dumont

For further constraints and conventions, please consult the BEACON specification.

Implementation in the FID DK

In order to provide BEACON files in the FID DK search portal, two main requirements had to be fulfilled:

  • the metadata obviously needs to contain authority data, preferably based on GND identifiers
  • entities are accessible via permanent URIs

As we harvest metadata from many data providers that already make heavy use of GND identifiers, the first requirement was easily met. Admittedly, even if we exclude those 40% of our data providers who don’t use GND at all, only 55% of the records contain at least one GND identifier. So there is still much enrichment work to do but matching entities to GND is a topic for a future blog post on its own.

The VuFind-based search portal of FID DK already offered the possibility to store and generate routes for authority records in a Solr authority index. In the FID DK, we chose to identify GND entities via their GND number in the authority index which lead to authority routes that already contain GND numbers.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#FORMAT: BEACON
#PREFIX: http://d-nb.info/gnd/
#TARGET: http://performing-arts.eu/agent/gnd_{ID}
#VERSION: 0.1
#FEED: http://performing-arts.eu/docs/beacon_persons.txt
#RELATION: http://www.w3.org/2000/01/rdf-schema#seeAlso
#CONTACT: Fachinformationsdienst Darstellende Kunst <redaktion@performing-arts.eu>
#INSTITUTION: Universitätsbibliothek Johann Christian Senckenberg
#NAME: Personennormdaten im Fachinformationsdienst Darstellende Kunst
#TIMESTAMP: 2021-11-23
119062887
1020247975
120465272
(...)

The BEACON file itself is generated from our underlying BaseX database with an XQuery script during our data pipeline. A simple text file that contains stable meta fields is extended with dynamic temporal information like the timestamp and the relevant identifiers that are queried from our authority databases for each entity type.

We decided against stating the number of hits for an entity as for example a person entity can not only be related to resources but also to events or works that a person has contributed to. See Fig. 1 for 652 events that Louise Dumont was involved in that are known to the discovery system on the basis of the provided metadata.

Even though BEACON files for other entity types like corporate bodies, works and events are not as widely used as person BEACON files, we decided to provide these types as well. We could generate them from the metadata the same way as person BEACON files and permanent URLs are also available in the search portal. As we currently don’t offer permanent URLs to places and topics, those entities are not yet publicly accessible. Links to our BEACON files can be found here.