简体   繁体   中英

How do I load a CSV into AWS RDS using an Airflow Postgres hook?

I'm trying to use the copy_expert hook here: https://airflow.apache.org/docs/stable/_modules/airflow/hooks/postgres_hook.html but I don't understand the syntax and I don't have an example to follow. My goal is to load a CSV into an AWS RDS instance running Postgres.

hook_copy_expert = airflow.hooks.postgres_hook.PostgresHook('postgres_amazon')

def import_to_postgres():
sql = f"DELETE FROM amazon.amazon_purchases; COPY amazon.amazon_purchases FROM '{path}' DELIMITER ',' CSV HEADER;"
        hook_copy_expert(sql, path, open=open)

t4 = PythonOperator(
    task_id = 'import_to_postgres',
    python_callable = import_to_postgres,
    dag = dag,
    )

When I run this, I get an error saying name 'sql' is not defined . Can someone help me understand what I'm doing wrong?

Edit: I got the hook to run but I got an error:

ERROR - must be superuser or a member of the pg_read_server_files role to COPY from a file
HINT:  Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.

I thought the whole point of using the Postgres hook was to use the COPY command in SQL without having superuser status? What am I doing wrong?

You can't run COPY on RDS, and you can't run psql's \COPY from a PostgreSQL operator.

Unless it's an enormous file, try loading the CSV data into memory with the Python csv module, and then inserting it to the DB.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM