简体   繁体   中英

How can i overwrite (instead of append) the data in destination table in Airflow "DataflowTemplateOperator()"?

I am using Airflow DataflowTemplateOperator() for migrating data from Mssql to Bigquery using JDBC to Bigquery dataflow template.

By default it appends data into destination Bigquery table.

I want to truncate the table first and then write new rows.

Is there any default parameter to change the dataflow / DataflowTemplateOperator settings from append to overwrite???

Dataflow templates are publicly available in GitHub, including the Java Database Connectivity (JDBC) to BigQuery . Although, you can not directly modify them, you can create a custom template by using the provided one as the source code, only changing the WriteDisposition .

If you check here the source code for the template you are currently using, you can see that on the 102nd line the WriteDisposition is WRITE_APPEND ` instead of WRITE_TRUNCATE . Below, I will describe the steps to change the template by creating a new one.

  1. In order to create a new template, copy the source code from the Google's GitHub page, here , to a new .java file;
  2. In the line number 102, change the WriteDisposition to WRITE_TRUNCATE ;
  3. After editing the code, you need to create and stage your template, such as described in the documentation ;
  4. Make sure that the destination table exists in BigQuery and these requirements for the template are met.
  5. Execute the custom template, such as described in the documentation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM