简体   繁体   中英

ETL vs Workflow Management and which to apply? Can they be used the same?

I am in the process of setting up a data pipeline for a client. I've spent a number of years being on the analysis side of things but now I am working with a small shop that only really has a production environment. The first thing we did was to create a replicated instance of production but I would like to apply a sort of data warehouse mentality to make the analysis portion easier.

My question comes down to what tool to use? Also, why? I have been looking at solutions like Talened for ETL but also am very interested in Airflow. The problem is that I'm not quite sure which suits my needs better. I would like to monitor and create jobs easily (I write python pretty fluently so Airflow job creation isn't an issue) but also be able to transform the data as it comes in.

Any suggestions are much appreciated

Please consider that the open source of talend (Talend Open Studio) does not provide any monitoring / scheduling capabilities. It is only "code generator". The more sophisticated infrastructure is part of the enterprise editions.

For anyone that sees this. Four years later and what we have done is leverage Airflow for scheduling, Fivetran and/or Sticher for extraction and loading, and dbt for transformations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM