Data Engineering with luigi - Lessons learned
Introduction At the UB JCS, we make extensive usage of the Python luigi framework for data engineering. The framework is capable of handling thousands of tasks, calculating non-circular task dependencies, and run over days. Additionally, it provides a convenient web control panel to see, e.g. the task dependencies in a tree diagram or start specific tasks.
Although luigi itself supports the user already by enforcing a very specific structure, there are still some things to consider when designing a data pipeline with luigi (for a general introduction, see in a previous post).