We are currently using the Airflow Python operator to load parquet files from GCS storage to BigQuery. I want to be able to declare all the numeric columns in the source to Big Numeric, is that possible?
bq_load = GCSToBigQueryOperator(
task_id="gcs_to_bigquery_modified_airflow",
bucket="{{ dag_run.conf['bucket'] }}",
source_objects=["{{ dag_run.conf['name'] }}"],
source_format ='parquet',
destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}",
create_disposition="CREATE_IF_NEEDED",
write_disposition="WRITE_APPEND",
autodetect=True
)
You can manually define the schema using schema_field
parameter of GCSToBigQueryOperator instead of using autodetect
.
Please see updated code below:
bq_load = GCSToBigQueryOperator(
task_id="gcs_to_bigquery_modified_airflow",
bucket="{{ dag_run.conf['bucket'] }}",
source_objects=["{{ dag_run.conf['name'] }}"],
source_format ='parquet',
destination_project_dataset_table="{{ task_instance.xcom_pull(task_ids='get_destination') }}",
create_disposition="CREATE_IF_NEEDED",
write_disposition="WRITE_APPEND",
schema_fields=[{"name": "sample_col_1", "type": "BIGNUMERIC", "mode": "NULLABLE"},{"name": "sample_col_2", "type": "BIGNUMERIC", "mode": "NULLABLE"}, {"name": "sample_col_3", "type": "BIGNUMERIC", "mode": "NULLABLE"}]
)
You may refer to this GCSToBigQueryOperator documentation for more details.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.