![](/img/trans.png)
[英]How to use GEOGRAPHY functions in Bigquery to convert Canadian FSA shapefiles?
[英]Cannot run geography functions in Bigquery airflow operator
我對地理數據和airflow都很陌生,所以如果我的問題不清楚,請原諒我並要求精確。
我正在嘗試通過 airflow(google composer)運行 DAG,以從特定數據集中的表中讀取數據,將特定列轉換為 GEOGRAPHY 類型,並將結果轉儲到另一個表中:
PROJECT = os.getenv("GCP_PROJECT")
default_args = {
"owner": "Airflow",
"depends_on_past": False,
"start_date": datetime(2020, 4, 1),
"email": ["foobarred@gmail.com"],
"email_on_failure": False,
"email_on_retry": False,
"retries": 5,
"retry_delay": timedelta(minutes=1),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
}
dag = DAG(DAG_ID, default_args=default_args, schedule_interval='@once ', catchup=False)
ingestion_query = f"""
SELECT
id,
epsg,
SAFE.ST_GEOGFROMTEXT(geometry) as geometry,
CURRENT_TIMESTAMP() as ingested_at
FROM dataset_raw.trees
WHERE SAFE.ST_GEOGFROMTEXT(geometry) is not NULL
"""
with dag:
etl_operator = BigQueryOperator(sql=ingestion_query,
destination_dataset_table=f'{PROJECT}.dataset_clean.trees',
write_disposition="WRITE_TRUNCATE",
task_id=f"full_dump_trees")
攝取查詢已經過測試,可以在 bigquery 控制台上運行。
但是,運行 DAG 時失敗並顯示以下錯誤消息
INFO - Job 150: Subtask full_dump_trees SELECT
INFO - Job 150: Subtask full_dump_trees id,
INFO - Job 150: Subtask full_dump_trees epsg,
INFO - Job 150: Subtask full_dump_trees SAFE.ST_GEOGFROMTEXT(geometry) as geometry,
INFO - Job 150: Subtask full_dump_trees CURRENT_TIMESTAMP() as ingested_at
INFO - Job 150: Subtask full_dump_trees FROM geodata_raw.trees
INFO - Job 150: Subtask full_dump_trees WHERE SAFE.ST_GEOGFROMTEXT(geometry) is not NULL
[...]
ERROR - BigQuery job failed. Final error was:
{'reason': 'invalidQuery', 'location': 'query',
'message': '5.37 - 5.46: Unrecognized function safe.st_geogfromtext\n[Try using standard SQL (https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql)]'}. The job was: {'kind': 'bigquery#job', 'etag': 'jBrdHGpAuAGJ48Z5A/ALIA==', 'id': 'strange-terra-273917:EU.job_oDUB9c3kz0NE1JKJ5tCaeHmuGN3d', 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/strange-terra-273917/jobs/job_oDUB9c3kz0NE1JKJ5tCaeHmuGN3d?location=EU', 'user_email': 'foo@bar.iam.gserviceaccount.com', 'configuration': {'query': {'query': '\n SELECT \n id, \n epsg, \n SAFE.ST_GEOGFROMTEXT(geometry) as geometry, \n CURRENT_TIMESTAMP() as ingested_at\n FROM geodata_raw.trees\n WHERE SAFE.ST_GEOGFROMTEXT(geometry) is not NULL\n ', 'destinationTable': {'projectId': 'strange-terra-273917', 'datasetId': 'geodata_clean', 'tableId': 'trees'}, 'createDisposition': 'CREATE_IF_NEEDED', 'writeDisposition': 'WRITE_TRUNCATE', 'priority': 'INTERACTIVE', 'allowLargeResults': False, 'useLegacySql': True}, 'jobType': 'QUERY'}, 'jobReference': {'projectId': 'strange-terra-273917', 'jobId': 'job_oDUB9c3kz0NE1JKJ5tCaeHmuGN3d', 'location': 'EU'}, 'statistics': {'creationTime': '1586798059340', 'startTime': '1586798059356', 'endTime': '1586798059356'}, 'status': {'errorResult': {'reason': 'invalidQuery', 'location': 'query', 'message': '5.37 - 5.46: Unrecognized function safe.st_geogfromtext\n[Try using standard SQL (https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql)]'}, 'errors': [{'reason': 'invalidQuery', 'location': 'query', 'message': '5.37 - 5.46: Unrecognized function safe.st_geogfromtext\n[Try using standard SQL (https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql)]'}], 'state': 'DONE'}}
Traceback (most recent call last)
File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 930, in _run_raw_tas
result = task_copy.execute(context=context
File "/usr/local/lib/airflow/airflow/contrib/operators/bigquery_operator.py", line 246, in execut
encryption_configuration=self.encryption_configuratio
File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 913, in run_quer
return self.run_with_configuration(configuration
File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 1344, in run_with_configuratio
format(job['status']['errorResult'], job)
Exception: BigQuery job failed. Final error was: ...
該錯誤指向一個似乎已棄用的鏈接。 它返回的文檔似乎表明地理函數是標准 SQL 的一部分,所以我不知道為什么這不起作用。
這是 Airflow bigquery 運算符的已知限制嗎?
編輯: 根據文檔,function ST_GEOGFROMTEXT
是谷歌稱為 bigquery 標准 SQL 的一部分。
您必須明確 state 您在代碼中使用標准 sql - 將 #standardSQL 添加到查詢的開頭 - 確保它位於您 sql 腳本的單獨第一行
要么
您可以在 BigQueryOperator 中進行設置
use_legacy_sql (bool) – Whether to use legacy SQL (true) or standard SQL (false).
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.