Tips for cleaner, faster and more maintainable XSLT code

2022-03-08

1776 words 9 mins read

Like many other Specialised Information Services (FID), we are working with XSLT to map XML metadata from data providers to our data model, an extended version of the RDF-XML based Europeana Data Model (EDM). In the FID Performing Arts (FID DK), we currently receive data from 22 data providers that deliver 6 different official metadata standards like MARC21, EAD and LIDO as well as 10 individual data standards that result from working with database systems like MS Access or FAUST DB. As most metadata is already delivered in XML, it only made sense to use a programming language like XSLT the main purpose of which is the transformation XML documents.

Providing BEACON files in discovery systems

2021-12-17

1520 words 8 mins read

When hearing the term beacon, people might think of flares or a lighthouse at first. It can certainly be a guide in the ocean of different authority files on the web. In this post, I will take a closer look at BEACON files, their implementation and why they are a useful addition to discovery systems like the Specialised Information Service Performing Arts (FID DK).

Introduction to BEACON

Authority data disambiguates and represents controlled entities like persons, corporate bodies, places, topics, works and events via unique identifiers. It allows for better accessibility and consistency of information while making cataloging more maintainable. Even though authority data like the German Integrated Authority File (GND) is widely used today, it is still a challenge to bundle information and resources about an entity that can be found in discovery systems, bibliographies and data collections on the web. Introduced by Jakob Voß and Mathias Schindler in an early version on Wikipedia in 2010, BEACON serves as a data interchange format in order to interlink websites that use such authority data. BEACON files contain concordances of URLs of an authority file like GND and URLs of e.g. a discovery system so that further resources about an entity can be found via direct links.

CPU-intensive Python Web Backends with asyncio and multiprocessing, Part II

2021-10-27

1991 words 10 mins read

In the first post of this series, I looked at how to achieve parallel execution in Python using multiprocessing and discussed how this is unsuitable with WSGI-based web frameworks because WSGI only allows the web server to create new processes, not the framework. At the end, I mentioned several alternative Python HTTP servers which use asynchronous I/O with an event-loop-based scheduler to handle parallelism.

In this post, we will look at how asynchronous I/O works in general, and specifically how it works in Python.

CPU-intensive Python Web Backends with asyncio and multiprocessing, Part I

2021-09-15

1351 words 7 mins read

Python, popular though it is, has a few well-known weaknesses. One of the most well known among serious users of the language is the lack of multicore support for in-process threads. This is because CPython, Python’s standard implementation has a global interpreter lock (often referred to as the GIL). The GIL locks each instance of the interpreter to a single core—a common approach to avoid race conditions in the implementation of language interpreters.

Getting the (Semantic) Sense out of a User Query

2021-09-15

1753 words 9 mins read

The BIOfid Semantic Search

Within the BIOfid-project, we create a semantic search portal (hereafter “BIOfid portal”) to help our users to access legacy biodiversity literature more easily. Hence, since the BIOfid portal has a deeper “understanding” of both the texts and the included species, it allows the users to get more relevant documents. Moreover, the BIOfid portal interprets the user query and transform it ad hoc into a graph database query, to learn more about their intention.