Table transfer stops after first iteration in Airflow

Question

I use the following code to transfer small table from database A to database B with Airflow (MWAA):

def move_data(sql_file_name, target_tbl_name, target_schema_name):
    select_stmt = ""
    dir_path = os.path.dirname(os.path.realpath(__file__))
    with open(dir_path +'/' + sql_file_name, 'r') as file:
        select_stmt = file.read().replace('%', '%%')

    src = PostgresHook(postgres_conn_id="A")
    src_engine = src.get_sqlalchemy_engine().connect()
    dest = PostgresHook(postgres_conn_id="B")
    dest_engine = dest.get_sqlalchemy_engine().connect()

    for chunk in pd.read_sql(select_stmt, src_engine, chunksize=30000):
        print('rows =  {0}, columns = {1}'.format(chunk.shape[0], chunk.shape[1]))
        try:
            chunk.to_sql(name=target_tbl_name, con=dest_engine,
                    schema=target_schema_name, chunksize=30000,
                    if_exists='replace', index=False, method='multi')
        except Exception as e:
            print(e)
    dest_engine.execute('commit;')
    dest_engine.close()

However the code only loops once and does not transfer any records, only the schema of the table in the target database. The table has around 50000 records, but tweaking the chunksize does not help. There are no errors in the logs.

That code works fine when executed in Jupyter notebook, without using Airflow Hooks.

Any suggestions what the issue might be?

Answer 1

For anyone who lands here: the issue is with chunksize=30000 If the chunksize is too large, the Workers will exit and the function would fail silently. Decreasing the chunksize helped.

Table transfer stops after first iteration in Airflow

Question

1 answers

solution1
0 2021-11-15 08:33:53

Table transfer stops after first iteration in Airflow

Question

1 answers

solution1 0 2021-11-15 08:33:53

solution1
0 2021-11-15 08:33:53