简体   繁体   English

Google GCP Cloud Functions 到 BigQuery 错误

[英]Google GCP Cloud Functions to BigQuery Error

I'd created a Cloud Functions for sending data to BigQuery The Cloud Functions is receiving data from pub/sub.我创建了一个用于将数据发送到 BigQuery 的 Cloud Functions Cloud Functions 正在从 pub/sub 接收数据。

Scenario 1: I write a python code directly send JSON data to Bigquery, no problem场景一:我写了一个python代码直接发送JSON数据到Bigquery,没问题

Scenario 2: I save the JSON data to.json file, and use bq load command to manually upload to Bigquery, no problem场景2:我把JSON数据保存到.json文件,用bq load命令手动上传到Bigquery,没问题

Scenario 3: (Where error comes in) Cloud Functions can receive data from Pub/Sub, but cannot send it to BigQuery.场景 3:(出现错误)Cloud Functions 可以从 Pub/Sub 接收数据,但无法将其发送到 BigQuery。

Here's the code fo Cloud Functions:这是云函数的代码:

from google.cloud import bigquery
import base64, json, sys, os

def pubsub_to_bq(event, context):
   if 'data' in event:
      print("Event Data is found : " + str(event['data']))
      name = base64.b64decode(event['data']).decode('utf-8')
   else:
      name = 'World'
   print('Hello {}!'.format(name))


   pubsub_message = base64.b64decode(event['data']).decode('utf-8')
   print(pubsub_message)
   to_bigquery(os.environ['dataset'], os.environ['table'], json.loads(str(pubsub_message)))

def to_bigquery(dataset, table, document):
   bigquery_client = bigquery.Client()
   table = bigquery_client.dataset(dataset).table(table)
   
   job_config.source_format = bq.SourceFormat.NEWLINE_DELIMITED_JSON
   job_config = bq.LoadJobConfig()
   job_config.autodetect = True
   
   errors = bigquery_client.insert_rows_json(table,json_rows=[document],job_config=job_config)
   if errors != [] :
      print(errors, file=sys.stderr)

I've tried JSON data format with both types, but no luck.我已经尝试了两种类型的 JSON 数据格式,但没有运气。 [{"field1":"data1","field2":"data2"}] OR {"field1":"data1","field2":"data2"} [{"field1":"data1","field2":"data2"}] 或 {"field1":"data1","field2":"data2"}

All the error messages I could get from Cloud Functions Event Logs are: textPayload: "Function execution took 100 ms, finished with status: 'crash'"我可以从 Cloud Functions 事件日志中获得的所有错误消息是:textPayload:“函数执行耗时 100 毫秒,完成状态:'crash'”

Could any expert help me on this?任何专家可以帮助我吗? Thanks.谢谢。

If you have a look to the library code , the insert_rows_json you have this如果你看一下库代码,你有这个insert_rows_json

    def insert_rows_json(
        self,
        table,
        json_rows,
        row_ids=None,
        skip_invalid_rows=None,
        ignore_unknown_values=None,
        template_suffix=None,
        retry=DEFAULT_RETRY,
        timeout=None,
    ):

No job_config parameter!没有job_config参数! The crash should come from this mistake崩溃应该来自这个错误

The method insert_rows_json performs a streaming insert and not a load job. insert_rows_json方法执行流式插入而不是加载作业。

For a load job from JSON, you can use load_table_from_json method that you can also found in the source code of the library.对于来自 JSON 的加载作业,您可以使用load_table_from_json方法,您也可以在库的源代码中找到该方法。 The code base is similar to this (for the JobConfig option)代码库与此类似(对于 JobConfig 选项)

    def load_table_from_json(
        self,
        json_rows,
        destination,
        num_retries=_DEFAULT_NUM_RETRIES,
        job_id=None,
        job_id_prefix=None,
        location=None,
        project=None,
        job_config=None,
    ):

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM