簡體   English   中英

Airflow bigquery_to_gcs 運營商改變 field_delimiter

[英]Airflow bigquery_to_gcs operator changing field_delimiter

我正在嘗試使用 Airflow 運算符 BigQueryToGCSOperator 並強制field_delimiter為 pipe (|),但是文件的 output 始終以逗號 (,) 分隔。

我也嘗試過具有相同行為的運算符 BigQueryToCloudStorageOperator。

知道我在這里做錯了什么嗎?

from airflow.providers.google.cloud.transfers.bigquery_to_gcs import (
    BigQueryToGCSOperator,
  
data_to_gcs = BigQueryToGCSOperator(
        task_id="BigQuery_to_GoogleCloudBucket",
        gcp_conn_id="google_cloud_default",
        project_id=project_id,
        source_project_dataset_table=f"{project_id}.{temp_dataset_id}.{temp_table}",
        location="EU",
        print_header=True,
        destination_cloud_storage_uris=destination_uri,
        export_format="csv",
        field_delimiter="|",
    )

預先感謝您的回復。

通常,如果您將export_format字段設置為CSV (大寫而不是小寫)和field_delimiter ,它應該可以工作:

from airflow.providers.google.cloud.transfers.bigquery_to_gcs import (
    BigQueryToGCSOperator,
  
data_to_gcs = BigQueryToGCSOperator(
        task_id="BigQuery_to_GoogleCloudBucket",
        gcp_conn_id="google_cloud_default",
        project_id=project_id,
        source_project_dataset_table=f"{project_id}.{temp_dataset_id}.{temp_table}",
        location="EU",
        print_header=True,
        destination_cloud_storage_uris=destination_uri,
        export_format="CSV",
        field_delimiter="|",
    )

我在Airflow代碼中看到了這個代碼片段,我認為如果將export_formatcsv設置為小寫值可能會導致問題:

if self.export_format == 'CSV':
     # Only set fieldDelimiter and printHeader fields if using CSV.
     # Google does not like it if you set these fields for other export
     # formats.
     configuration['extract']['fieldDelimiter'] = self.field_delimiter
     configuration['extract']['printHeader'] = self.print_header

在您的情況下,不會調用此代碼片段,並且運算符采用field_delimiter的默認值,即,

在這里,您可以在Airflow代碼中看到此運算符的構造函數中使用的默認值:

def __init__(
    self,
    *,
    source_project_dataset_table: str,
    destination_cloud_storage_uris: List[str],
    compression: str = 'NONE',
    export_format: str = 'CSV',
    field_delimiter: str = ',',
    print_header: bool = True,
    gcp_conn_id: str = 'google_cloud_default',
    bigquery_conn_id: Optional[str] = None,
    delegate_to: Optional[str] = None,
    labels: Optional[Dict] = None,
    location: Optional[str] = None,
    impersonation_chain: Optional[Union[str, Sequence[str]]] = None,
    **kwargs,
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM