简体   繁体   中英

Schedulling for Cloud Dataflow Job

So, I already finish to create a job in Dataflow. This job to process ETL from PostgreSQL to BigQuery. So, I don't know to create a schedulling using Airflow. Can share how to schedule job dataflow using Airflow?

Thank you

You can schedule dataflow batch jobs using Cloud Scheduler (fully managed cron job scheduler) / Cloud Composer (fully managed workflow orchestration service built on Apache Airflow).

To schedule using Cloud Scheduler refer Schedule Dataflow batch jobs with Cloud Scheduler

To schedule using Cloud Composer refer Launching Dataflow pipelines with Cloud Composer using DataflowTemplateOperator .

For examples and more ways to run Dataflow jobs in Airflow using Java/Python SDKs refer Google Cloud Dataflow Operators

In your Airflow DAG, you can define a cron and a scheduling with schedule_interval param:

with airflow.DAG(
        my_dag,
        default_args=args,
        schedule_interval="5 3 * * *"

    # Trigger Dataflow job with an operator
    launch_dataflow_job = BeamRunPythonPipelineOperator(
        runner='DataflowRunner',
        py_file=python_main_file,
        task_id='launch_dataflow_job',
        pipeline_options=dataflow_job_options,
        py_system_site_packages=False,
        py_interpreter='python3',
        dataflow_config=DataflowConfiguration(
            location='region'
        )
    )

    launch_dataflow_job
    ......

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM