![](/img/trans.png)
[英]Airflow: How would I write a Python operator for an extract function from BigQuery to GCS function?
[英]Airflow bigquery_to_gcs operator changing field_delimiter
我正在嘗試使用 Airflow 運算符 BigQueryToGCSOperator 並強制field_delimiter為 pipe (|),但是文件的 output 始終以逗號 (,) 分隔。
我也嘗試過具有相同行為的運算符 BigQueryToCloudStorageOperator。
知道我在這里做錯了什么嗎?
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import (
BigQueryToGCSOperator,
data_to_gcs = BigQueryToGCSOperator(
task_id="BigQuery_to_GoogleCloudBucket",
gcp_conn_id="google_cloud_default",
project_id=project_id,
source_project_dataset_table=f"{project_id}.{temp_dataset_id}.{temp_table}",
location="EU",
print_header=True,
destination_cloud_storage_uris=destination_uri,
export_format="csv",
field_delimiter="|",
)
預先感謝您的回復。
通常,如果您將export_format
字段設置為CSV
(大寫而不是小寫)和field_delimiter
,它應該可以工作:
from airflow.providers.google.cloud.transfers.bigquery_to_gcs import (
BigQueryToGCSOperator,
data_to_gcs = BigQueryToGCSOperator(
task_id="BigQuery_to_GoogleCloudBucket",
gcp_conn_id="google_cloud_default",
project_id=project_id,
source_project_dataset_table=f"{project_id}.{temp_dataset_id}.{temp_table}",
location="EU",
print_header=True,
destination_cloud_storage_uris=destination_uri,
export_format="CSV",
field_delimiter="|",
)
我在Airflow
代碼中看到了這個代碼片段,我認為如果將export_format
的csv
設置為小寫值可能會導致問題:
if self.export_format == 'CSV':
# Only set fieldDelimiter and printHeader fields if using CSV.
# Google does not like it if you set these fields for other export
# formats.
configuration['extract']['fieldDelimiter'] = self.field_delimiter
configuration['extract']['printHeader'] = self.print_header
在您的情況下,不會調用此代碼片段,並且運算符采用field_delimiter
的默認值,即,
在這里,您可以在Airflow
代碼中看到此運算符的構造函數中使用的默認值:
def __init__(
self,
*,
source_project_dataset_table: str,
destination_cloud_storage_uris: List[str],
compression: str = 'NONE',
export_format: str = 'CSV',
field_delimiter: str = ',',
print_header: bool = True,
gcp_conn_id: str = 'google_cloud_default',
bigquery_conn_id: Optional[str] = None,
delegate_to: Optional[str] = None,
labels: Optional[Dict] = None,
location: Optional[str] = None,
impersonation_chain: Optional[Union[str, Sequence[str]]] = None,
**kwargs,
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.