简体   繁体   中英

PythonAirflow operator

We are currently using the Airflow Python operator to load parquet files from GCS storage to BigQuery. I want to be able to declare all the numeric columns in the source to Big Numeric, is that possible?

bq_load = GCSToBigQueryOperator(
    task_id="gcs_to_bigquery_modified_airflow",
    bucket="{{ dag_run.conf['bucket'] }}",
    source_objects=["{{ dag_run.conf['name'] }}"],
    source_format ='parquet',
    destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}", 
    create_disposition="CREATE_IF_NEEDED",
    write_disposition="WRITE_APPEND",
    autodetect=True
)

You can manually define the schema using schema_field parameter of GCSToBigQueryOperator instead of using autodetect .

Please see updated code below:

bq_load = GCSToBigQueryOperator(
    task_id="gcs_to_bigquery_modified_airflow",
    bucket="{{ dag_run.conf['bucket'] }}",
    source_objects=["{{ dag_run.conf['name'] }}"],
    source_format ='parquet',
    destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}", 
    create_disposition="CREATE_IF_NEEDED",
    write_disposition="WRITE_APPEND",
    schema_fields=[{"name": "sample_col_1", "type": "BIGNUMERIC", "mode": "NULLABLE"},{"name": "sample_col_2", "type": "BIGNUMERIC", "mode": "NULLABLE"}, {"name": "sample_col_3", "type": "BIGNUMERIC", "mode": "NULLABLE"}]
)

You may refer to this GCSToBigQueryOperator documentation for more details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM