简体   繁体   中英

Workflow system for both ETL and Queries by Users

I am looking for a workflow system that supports the following needs:

  1. dealing with a complex ETL pipeline with various kinds of APIs (file-based, REST, console, databases, ...)
  2. offers automated scheduling/orchestration on different execution environments (AWS, Azure, on-Premise clusters, local machine, ...)
  3. has an option for "reactive" workflows ie workflows that can be triggered and executed instantaneously without unnecessary delay, are executed with highest priority and the same workflow can be started several times simultaneously

Especially the third requirement seems to be tricky to find. The purpose of this requirement is that a user should be able to send a query to activate a (computationally non-heavy) workflow and get back a result immediately instead of waiting some seconds or even minutes and multiple users might want to use the same workflow simultaneously. The reason this is important is that the ETL workflows and the user ("reactive") workflows share a substantial overlap and I do intend to reuse parts of these workflows instead of maintaining two sets of workflows that are executed by different tools.

Apache Airflow appears to be the natural choice for requirements 1. and 2. but does not seem to support the third requirement since it starts the execution in (lengthy) fixed time slots and does not allow for the simulataneous execution of several instances of the same DAG (workflow).

Are there any tools out there that support all these requirements or do I have to use two different workflow management tools or even have to stick to a (Python) script for the user workflows?

You can trigger a dag manually by using the CLI or the API. Have a look at this post: https://medium.com/@ntruong/airflow-externally-trigger-a-dag-when-a-condition-match-26cae67ecb1a

You'll have to test if you can execute multiple dag runs at the same time.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM