简体   繁体   English

无法在 Bigquery airflow 运算符中运行地理函数

[英]Cannot run geography functions in Bigquery airflow operator

I am pretty new to both geography data and airflow, so please forgive me and ask for precisions if my question is not clear.我对地理数据和airflow都很陌生,所以如果我的问题不清楚,请原谅我并要求精确。

I am trying to run a DAG through airflow (google composer), to read data from a table in a specific dataset, convert a specific column to a GEOGRAPHY type, and dump the result in another table:我正在尝试通过 airflow(google composer)运行 DAG,以从特定数据集中的表中读取数据,将特定列转换为 GEOGRAPHY 类型,并将结果转储到另一个表中:


PROJECT = os.getenv("GCP_PROJECT")

default_args = {
    "owner": "Airflow",
    "depends_on_past": False,
    "start_date": datetime(2020, 4, 1),
    "email": ["foobarred@gmail.com"],
    "email_on_failure": False,
    "email_on_retry": False,
    "retries": 5,
    "retry_delay": timedelta(minutes=1),
    # 'queue': 'bash_queue',
    # 'pool': 'backfill',
    # 'priority_weight': 10,
    # 'end_date': datetime(2016, 1, 1),
}



dag = DAG(DAG_ID, default_args=default_args, schedule_interval='@once ', catchup=False)



ingestion_query = f"""
            SELECT 
                id, 
                epsg, 
                SAFE.ST_GEOGFROMTEXT(geometry) as geometry, 
                CURRENT_TIMESTAMP() as ingested_at
            FROM dataset_raw.trees
            WHERE SAFE.ST_GEOGFROMTEXT(geometry) is not NULL
        """
with dag:
    etl_operator = BigQueryOperator(sql=ingestion_query,
                                    destination_dataset_table=f'{PROJECT}.dataset_clean.trees',
                                    write_disposition="WRITE_TRUNCATE",
                                    task_id=f"full_dump_trees")

The ingestion query has been tested and is working from the bigquery console.摄取查询已经过测试,可以在 bigquery 控制台上运行。

However, when running the DAG, it fails with the following error message但是,运行 DAG 时失败并显示以下错误消息

INFO - Job 150: Subtask full_dump_trees             SELECT 
INFO - Job 150: Subtask full_dump_trees                 id, 
INFO - Job 150: Subtask full_dump_trees                 epsg, 
INFO - Job 150: Subtask full_dump_trees                 SAFE.ST_GEOGFROMTEXT(geometry) as geometry, 
INFO - Job 150: Subtask full_dump_trees                 CURRENT_TIMESTAMP() as ingested_at
INFO - Job 150: Subtask full_dump_trees             FROM geodata_raw.trees
INFO - Job 150: Subtask full_dump_trees             WHERE SAFE.ST_GEOGFROMTEXT(geometry) is not NULL
[...]
ERROR - BigQuery job failed. Final error was: 
{'reason': 'invalidQuery', 'location': 'query', 
 'message': '5.37 - 5.46: Unrecognized function safe.st_geogfromtext\n[Try using standard SQL (https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql)]'}. The job was: {'kind': 'bigquery#job', 'etag': 'jBrdHGpAuAGJ48Z5A/ALIA==', 'id': 'strange-terra-273917:EU.job_oDUB9c3kz0NE1JKJ5tCaeHmuGN3d', 'selfLink': 'https://bigquery.googleapis.com/bigquery/v2/projects/strange-terra-273917/jobs/job_oDUB9c3kz0NE1JKJ5tCaeHmuGN3d?location=EU', 'user_email': 'foo@bar.iam.gserviceaccount.com', 'configuration': {'query': {'query': '\n            SELECT \n                id, \n                epsg, \n                SAFE.ST_GEOGFROMTEXT(geometry) as geometry, \n                CURRENT_TIMESTAMP() as ingested_at\n            FROM geodata_raw.trees\n            WHERE SAFE.ST_GEOGFROMTEXT(geometry) is not NULL\n        ', 'destinationTable': {'projectId': 'strange-terra-273917', 'datasetId': 'geodata_clean', 'tableId': 'trees'}, 'createDisposition': 'CREATE_IF_NEEDED', 'writeDisposition': 'WRITE_TRUNCATE', 'priority': 'INTERACTIVE', 'allowLargeResults': False, 'useLegacySql': True}, 'jobType': 'QUERY'}, 'jobReference': {'projectId': 'strange-terra-273917', 'jobId': 'job_oDUB9c3kz0NE1JKJ5tCaeHmuGN3d', 'location': 'EU'}, 'statistics': {'creationTime': '1586798059340', 'startTime': '1586798059356', 'endTime': '1586798059356'}, 'status': {'errorResult': {'reason': 'invalidQuery', 'location': 'query', 'message': '5.37 - 5.46: Unrecognized function safe.st_geogfromtext\n[Try using standard SQL (https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql)]'}, 'errors': [{'reason': 'invalidQuery', 'location': 'query', 'message': '5.37 - 5.46: Unrecognized function safe.st_geogfromtext\n[Try using standard SQL (https://cloud.google.com/bigquery/docs/reference/standard-sql/enabling-standard-sql)]'}], 'state': 'DONE'}}
Traceback (most recent call last)
  File "/usr/local/lib/airflow/airflow/models/taskinstance.py", line 930, in _run_raw_tas
    result = task_copy.execute(context=context
  File "/usr/local/lib/airflow/airflow/contrib/operators/bigquery_operator.py", line 246, in execut
    encryption_configuration=self.encryption_configuratio
  File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 913, in run_quer
    return self.run_with_configuration(configuration
  File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 1344, in run_with_configuratio
    format(job['status']['errorResult'], job)
Exception: BigQuery job failed. Final error was: ...

The error points to a link which seems to be deprecated.该错误指向一个似乎已弃用的链接 The documentation it sens back to seems to tell that geography functions are part of standard SQL so I am at a loss about why this would not work.它返回的文档似乎表明地理函数是标准 SQL 的一部分,所以我不知道为什么这不起作用。

Is that a known limitation of Airflow bigquery operators?这是 Airflow bigquery 运算符的已知限制吗?

EDIT: As per the documentation , the function ST_GEOGFROMTEXT is part of what google calls the standard SQL for bigquery.编辑: 根据文档,function ST_GEOGFROMTEXT是谷歌称为 bigquery 标准 SQL 的一部分。

You must explicitly state that you are using standard sql in your code - either add #standardSQL to the beginning of your query - make sure it is on separate and first row of you sql script您必须明确 state 您在代码中使用标准 sql - 将 #standardSQL 添加到查询的开头 - 确保它位于您 sql 脚本的单独第一行
Or要么
you can set this within BigQueryOperator您可以在 BigQueryOperator 中进行设置

use_legacy_sql (bool) – Whether to use legacy SQL (true) or standard SQL (false).

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Bigquery 中的 GEOGRAPHY 函数转换加拿大 FSA shapefile? - How to use GEOGRAPHY functions in Bigquery to convert Canadian FSA shapefiles? Airflow BigQuery Operator - 将一张表复制到另一张表 - Airflow BigQuery Operator - Copying One Table to Another Table Airflow bigquery_to_gcs 运营商改变 field_delimiter - Airflow bigquery_to_gcs operator changing field_delimiter 如何在 BigQuery 中过滤地理数据 - How to filter on geography data in BigQuery Airflow DAG:如何使用 Python 运算符而不是 BigQuery 运算符将数据插入表中? - Airflow DAG: How to insert data into a table using Python operator, not BigQuery operator? Airflow:我将如何编写一个 Python 运算符以从 BigQuery 提取 function 到 GCS function? - Airflow: How would I write a Python operator for an extract function from BigQuery to GCS function? Bigquery RegEx:“无法解析正则表达式:无效的 perl 运算符:(?= - Bigquery RegEx: "Cannot parse regular expression: invalid perl operator: (?=" Airflow 运算符 BigQueryTablePartitionExistenceSensor 问题 - Airflow Operator BigQueryTablePartitionExistenceSensor Question 无法将 map protobuf 字符串字段写入 BigQuery 表的订阅中的 BigQuery 地理字段 - Unable to map protobuf string field to BigQuery's geography field in a subscription writing to a bigquery table 为 Azure airflow 运算符获取等效的 DataprocCreateBatchOperator 运算符 - Get the equivalent DataprocCreateBatchOperator operator for Azure airflow operator
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM