简体   繁体   中英

how to run dataflow job with cloud composer

I know Apache beam and I am able to create pipeline using it, I also know which operator in Cloud Composer to use to run dataflow job, I just want to know how to convert plain apache beam code into dataflow job so that I can run it using Cloud Composer , what setting will I need what config will I need, I did not find Google doc very useful, please help me. My requirement is to read csv file from cloud storage and load it into BigQuery using dataflow and then schedule it using Cloud Composer. I am using Python.

Some tentatively useful GCP docs can be found here: https://cloud.google.com/composer/docs/how-to/using/using-dataflow-template-operator

But, in general, if you already have the Beam written (and it works), then you would want to specify the "Dataflow" runner.

For a 'custom' Dataflow job, you likely want the following Operator --> https://airflow.apache.org/docs/apache-airflow/1.10.6/_api/airflow/contrib/operators/dataflow_operator/index.html#airflow.contrib.operators.dataflow_operator.DataFlowPythonOperator

I'm sure you are aware that Cloud Composer is managed Airflow. So you can use 'regular' airflow operators.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM