简体   繁体   English

BigQuery Execute 失败,在 Cloud Data Fusion 上没有有意义的错误

[英]BigQuery Execute fails with no meaningful error on Cloud Data Fusion

I'm trying to use the BigQuery Execute function in Cloud Data Fusion (Google).我正在尝试在 Cloud Data Fusion (Google) 中使用 BigQuery Execute function。 The component validates fine, the SQL checks out but I get this non-meaningful error with every execution:该组件验证正常,SQL 检查出来但每次执行时我都会收到这个无意义的错误:

02/11/2022 12:51:25 ERROR Pipeline 'test-bq-execute' failed.
02/11/2022 12:51:25 ERROR Workflow service 'workflow.default.test-bq-execute.DataPipelineWorkflow.<guid>' failed.
02/11/2022 12:51:25 ERROR Program DataPipelineWorkflow execution failed.

I can see nothing else to help me debug this.我看不到其他任何东西可以帮助我调试它。 Any ideas?有任何想法吗? The SQL in question is a simple DELETE from dataset.table WHERE ds = CURRENT_DATE()有问题的 SQL 是从 dataset.table WHERE ds = CURRENT_DATE() 中简单删除

在此处输入图像描述

This was the pipeline这是管道

{
    "name": "test-bq-execute",
    "description": "Data Pipeline Application",
    "artifact": {
        "name": "cdap-data-pipeline",
        "version": "6.5.1",
        "scope": "SYSTEM"
    },
    "config": {
        "resources": {
            "memoryMB": 2048,
            "virtualCores": 1
        },
        "driverResources": {
            "memoryMB": 2048,
            "virtualCores": 1
        },
        "connections": [],
        "comments": [],
        "postActions": [],
        "properties": {},
        "processTimingEnabled": true,
        "stageLoggingEnabled": false,
        "stages": [
            {
                "name": "BigQuery Execute",
                "plugin": {
                    "name": "BigQueryExecute",
                    "type": "action",
                    "label": "BigQuery Execute",
                    "artifact": {
                        "name": "google-cloud",
                        "version": "0.18.1",
                        "scope": "SYSTEM"
                    },
                    "properties": {
                        "project": "auto-detect",
                        "sql": "DELETE FROM GCPQuickStart.account WHERE ds = CURRENT_DATE()",
                        "dialect": "standard",
                        "mode": "batch",
                        "dataset": "GCPQuickStart",
                        "table": "account",
                        "useCache": "false",
                        "location": "US",
                        "rowAsArguments": "false",
                        "serviceAccountType": "filePath",
                        "serviceFilePath": "auto-detect"
                    }
                },
                "outputSchema": [
                    {
                        "name": "etlSchemaBody",
                        "schema": ""
                    }
                ],
                "id": "BigQuery-Execute",
                "type": "action",
                "label": "BigQuery Execute",
                "icon": "fa-plug"
            }
        ],
        "schedule": "0 1 */1 * *",
        "engine": "spark",
        "numOfRecordsPreview": 100,
        "maxConcurrentRuns": 1
    }
}

I was able to catch the error using Cloud Logging.我能够使用 Cloud Logging 捕获错误。 To enable Cloud Logging in Cloud Data Fusion, you may use this GCP Documentation .要在 Cloud Data Fusion 中启用 Cloud Logging,您可以使用此GCP 文档 And follow these steps to view the logs from Data Fusion to Cloud Logging.并按照以下步骤查看从 Data Fusion 到 Cloud Logging 的日志。 Replicating your scenario this is the error I found:复制您的场景,这是我发现的错误:

      "logMessage": "Program DataPipelineWorkflow execution failed.\njava.util.concurrent.ExecutionException: com.google.cloud.bigquery.BigQueryException: Cannot set destination table in jobs with DML statements\n    at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)\n    at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)\n    at io.cdap.cdap.internal.app.runtime.distributed.AbstractProgramTwillRunnable.run(AbstractProgramTwillRunnable.java:274)\n    at org.apache.twill.interna..."
    }

What we did to resolve this error: Cannot set destination table in jobs with DML statements is we left the Dataset Name and Table Name empty inside the pipeline properties as there is no need for the destination table to be specified.我们为解决此错误所做的工作:无法使用 DML 语句在作业中设置目标表是我们在管道属性中将Dataset NameTable Name留空,因为不需要指定目标表。

在此处输入图像描述

Output: Output:

在此处输入图像描述

在此处输入图像描述

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM