Data Engineering with luigi - Lessons learned

Introduction At the UB JCS, we make extensive usage of the Python luigi framework for data engineering. The framework is capable of handling thousands of tasks, calculating non-circular task dependencies, and run over days. Additionally, it provides a convenient web control panel to see, e.g. the task dependencies in a tree diagram or start specific tasks. Although luigi itself supports the user already by enforcing a very specific structure, there are still some things to consider when designing a data pipeline with luigi (for a general introduction, see in a previous post).

Common engineering strategies in luigi

Introduction For many automated data processing tasks within the context of the Specialised Information Services (FID) at the University Library Frankfurt, we use the Python package luigi. This package proves especially useful when a task (e.g. the loading of data into a database) depends on the work of other tasks that have to run successfully, before the next task starts (e.g. first you need to download the data). luigi orchestrates all required tasks and their respective required task(s) and then processes everything for you.

Using another metadata standard than MARC21 in VuFind, Part II

Introduction In the first post of this series, we covered the necessary steps to populate VuFind’s Solr cores with title and authority records. This second post describes changes to configuration files, as well as modifications that are necessary to interact, display and export records. All our customizations are based on existing VuFind code and, to ensure maintainability, stored in the local/ folder and the custom module Fiddk. We want to remind the reader, that we assume basic VuFind knowledge, which can be acquired from the documentation.

Using another metadata standard than MARC21 in VuFind, Part I

Introduction When you install the open source discovery system VuFind, follow basic configuration steps and feed it with library records, it works well out-of-the-box and provides you with faceted search results and the possibility to browse through your data besides many other features. The easiest way to achieve this, is to load the standard interchange format for library records, i.e. MARC21, into the included Apache Solr-based search index. The FID Performing Arts uses VuFind since 2015, but as mentioned in an earlier post, we receive a vast amount of metadata from performing arts museums and archives in standards other than MARC21, such as EAD, METS/MODS and LIDO as well as other individual data formats which result from database systems like MS Access or FAUST DB.

Upgrading OJS with the ojs_updater

Introduction At the University Library Frankfurt, we currently host 21 OJS journals, with more to come. Since we apply a strategy that runs only a single journal within an OJS instance, we have to maintain 21 different OJS instances. In order to maintain and manage this multiplicity, we found it important to come up with structures on the server and helper tools. Especially the process of updating a journal instance can be quite tedious, since it involves multiple manual steps and can cause problems when forgetting something in the process.