We are trying to move from Pentaho Kettle, to Apache AIrflow to do ETL and centralize all data processes under 1 tool.
We use Kettle to daily read data from Postgres/Mysql databases, and move the data to S3 -> Redshift.
What is the easiest way to do this? I do not see Operator that could directly do this; so Should i use MySQL/Postgres operator to put data in a local file, and the use S3 operator to move data to S3?
Thank you
You can build your own operator 'mysql_to_s3' and add it as a plugin to Airflow.
There is an operator to archive data from Mysql to gcs:
You can let all the code with a little change on def _upload_to_gcs
using s3_hook instead: s3_hook.py .
Documentation about custom plugins:
airflow-plugins (by Astronomer) has a MySqlToS3Operator
that will take the resultset of a mysql query and place it on s3 as either csv or json.
The plugin can be found here: https://github.com/airflow-plugins/mysql_plugin/blob/master/operators/mysql_to_s3_operator.py
From there you might be able to use the s3_to_redshift operator to load data from S3 into redshift: https://airflow.readthedocs.io/en/latest/_modules/airflow/operators/s3_to_redshift_operator.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.