I need to execute a SQL query on multiple databases (10K) with the same schema and insert the results into a separate host using airflow.
Do you have any idea how I should design my DAG in the most efficient way for this kind of project?
Any help would be very appreciated!
One connection per database in Airflow.
Then define a list of those connection id strings.
Then repeat a task definition for the same task on each of the connection strings.
EG With MysqlOperator (see also MssqlOperator or PostgresOperator)
conns = ('db1','db2','db3')
tasks = [MysqlOperator("""
show tables;
""",
task_id="update_" + conn,
mysql_conn_id=conn,
) for conn in conns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.