PythonAirflow operator

Question

We are currently using the Airflow Python operator to load parquet files from GCS storage to BigQuery. I want to be able to declare all the numeric columns in the source to Big Numeric, is that possible?

bq_load = GCSToBigQueryOperator(
    task_id="gcs_to_bigquery_modified_airflow",
    bucket="{{ dag_run.conf['bucket'] }}",
    source_objects=["{{ dag_run.conf['name'] }}"],
    source_format ='parquet',
    destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}", 
    create_disposition="CREATE_IF_NEEDED",
    write_disposition="WRITE_APPEND",
    autodetect=True
)

Answer 1

You can manually define the schema using schema_field parameter of GCSToBigQueryOperator instead of using autodetect .

Please see updated code below:

bq_load = GCSToBigQueryOperator(
    task_id="gcs_to_bigquery_modified_airflow",
    bucket="{{ dag_run.conf['bucket'] }}",
    source_objects=["{{ dag_run.conf['name'] }}"],
    source_format ='parquet',
    destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}", 
    create_disposition="CREATE_IF_NEEDED",
    write_disposition="WRITE_APPEND",
    schema_fields=[{"name": "sample_col_1", "type": "BIGNUMERIC", "mode": "NULLABLE"},{"name": "sample_col_2", "type": "BIGNUMERIC", "mode": "NULLABLE"}, {"name": "sample_col_3", "type": "BIGNUMERIC", "mode": "NULLABLE"}]
)

You may refer to this GCSToBigQueryOperator documentation for more details.

PythonAirflow operator

Question

1 answers

solution1
2 2023-01-11 21:14:16

PythonAirflow operator

Question

1 answers

solution1 2 2023-01-11 21:14:16

solution1
2 2023-01-11 21:14:16