简体   繁体   中英

Airflow: How to run same query on multiple databases

I need to execute a SQL query on multiple databases (10K) with the same schema and insert the results into a separate host using airflow.

Do you have any idea how I should design my DAG in the most efficient way for this kind of project?

Any help would be very appreciated!

One connection per database in Airflow.
Then define a list of those connection id strings.
Then repeat a task definition for the same task on each of the connection strings.

EG With MysqlOperator (see also MssqlOperator or PostgresOperator)

conns = ('db1','db2','db3')
tasks = [MysqlOperator("""
show tables;
""",
                       task_id="update_" + conn,
                       mysql_conn_id=conn,
          ) for conn in conns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM