简体   繁体   中英

Table transfer stops after first iteration in Airflow

I use the following code to transfer small table from database A to database B with Airflow (MWAA):

def move_data(sql_file_name, target_tbl_name, target_schema_name):
    select_stmt = ""
    dir_path = os.path.dirname(os.path.realpath(__file__))
    with open(dir_path +'/' + sql_file_name, 'r') as file:
        select_stmt = file.read().replace('%', '%%')

    src = PostgresHook(postgres_conn_id="A")
    src_engine = src.get_sqlalchemy_engine().connect()
    dest = PostgresHook(postgres_conn_id="B")
    dest_engine = dest.get_sqlalchemy_engine().connect()

    for chunk in pd.read_sql(select_stmt, src_engine, chunksize=30000):
        print('rows =  {0}, columns = {1}'.format(chunk.shape[0], chunk.shape[1]))
        try:
            chunk.to_sql(name=target_tbl_name, con=dest_engine,
                    schema=target_schema_name, chunksize=30000,
                    if_exists='replace', index=False, method='multi')
        except Exception as e:
            print(e)
    dest_engine.execute('commit;')
    dest_engine.close()

However the code only loops once and does not transfer any records, only the schema of the table in the target database. The table has around 50000 records, but tweaking the chunksize does not help. There are no errors in the logs.

That code works fine when executed in Jupyter notebook, without using Airflow Hooks.

Any suggestions what the issue might be?

For anyone who lands here: the issue is with chunksize=30000 If the chunksize is too large, the Workers will exit and the function would fail silently. Decreasing the chunksize helped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM