简体   繁体   English

Bigquery bq 加载内部错误

[英]Bigquery bq load Internal Error

Background背景

I am trying to load a json file x.json using bq cli.我正在尝试使用 bq cli 加载一个 json 文件 x.json。

cat x.json猫 x.json

{"name":"xyz","mobile":"xxx","location":"abc"} {"name":"xyz","mobile":"xxx","location":"abc"}

{"name":"xyz","mobile":"xxx","age":"22"} {"name":"xyz","mobile":"xxx","age":"22"}

Command Used使用的命令

bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON project:test_datasets.cust x.json bq load --autodetect --source_format=NEWLINE_DELIMITED_JSON 项目:test_datasets.cust x.json

' cust ' is a table with empty schema. ' cust ' 是一个具有空架构的表。

I am using ' --autodetect ,so that BigQuery autodetects schema.我正在使用 ' --autodetect ,以便 BigQuery 自动检测架构。

Output输出

Upload complete.上传完成。

Waiting on bqjob_r475558282b85c552_000001569cf1efd8_1 ... (1s) Current status: DONE等待 bqjob_r475558282b85c552_000001569cf1efd8_1 ... (1s) 当前状态:DONE
BigQuery error in load operation: Error processing job 'project:bqjob_r475558282b85c552_000001569cf1efd8_1': An internal error occurred and the request could not be completed.加载操作中的 BigQuery 错误:处理作业“项目:bqjob_r475558282b85c552_000001569cf1efd8_1”时出错:发生内部错误,无法完成请求。

Any thoughts on ,why Internal error occurs and how to resolve it?关于为什么会发生内部错误以及如何解决它的任何想法?

We seen several problems:我们看到了几个问题:

  • the request randomly fails with type 'Backend error'请求随机失败,类型为“后端错误”
  • the request randomly fails with type 'Connection error'请求随机失败,类型为“连接错误”
  • the request randomly fails with type 'timeout' (watch out here, as only some rows are failing and not the whole payload)请求随机失败,类型为“超时”(注意这里,因为只有一些行失败,而不是整个有效负载)
  • some other error messages are non descriptive, and they are so vague that they don't help you, just retry.其他一些错误消息是非描述性的,它们非常含糊,对您没有帮助,只需重试即可。
  • we see hundreds of such failures each day, so they are pretty much constant, and not related to Cloud health.我们每天都会看到数百次这样的故障,所以它们几乎是恒定的,与云健康无关。

For all these we opened cases in paid Google Enterprise Support, but unfortunately they didn't resolved it.对于所有这些,我们在付费的 Google Enterprise Support 中打开了案例,但不幸的是他们没有解决。 It seams the recommended option to take is an exponential-backoff with retry , even the support told to do so.它接缝推荐的选项是使用 retry 的指数退避,即使支持人员也被告知这样做。 Also the failure rate fits the 99.9% uptime we have in the SLA, so there is no reason for objection.此外,故障率符合 SLA 中 99.9% 的正常运行时间,因此没有理由反对。

There's something to keep in mind in regards to the SLA, it's a very strictly defined structure, the details are here .关于 SLA,需要记住一些事情,它是一个非常严格定义的结构,详细信息在这里 The 99.9% is uptime not directly translated into fail rate. 99.9% 是正常运行时间并不能直接转化为故障率。 What this means is that if BQ has a 30 minute downtime one month, and then you do 10,000 inserts within that period but didn't do any inserts in other times of the month, it will cause the numbers to be skewered.这意味着,如果 BQ 在一个月内有 30 分钟的停机时间,然后您在该期间内进行了 10,000 次插入,但在该月的其他时间没有进行任何插入,则会导致数字出现偏差。 This is why we suggest a exponential backoff algorithm.这就是我们建议指数退避算法的原因。 The SLA is explicitly based on uptime and not error rate, but logically the two correlates closely if you do streaming inserts throughout the month at different times with backoff-retry setup. SLA 明确基于正常运行时间而不是错误率,但如果您在整个月的不同时间使用退避重试设置进行流式插入,则两者在逻辑上密切相关。 Technically, you should experience on average about 1/1000 failed insert if you are doing inserts through out the month if you have setup the proper retry mechanism.从技术上讲,如果您设置了正确的重试机制,那么如果您在整个月进行插入操作,您平均应该会遇到大约 1/1000 的插入失败。

You can check out this chart about your project health: https://console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D您可以查看有关项目运行状况的图表: https : //console.developers.google.com/project/YOUR-APP-ID/apiui/apiview/bigquery?tabId=usage&duration=P1D

Try with this:试试这个:

bq \
--project_id your_id_proyect \
--location=US \
load \
--autodetect \
--source_format=NEWLINE_DELIMITED_JSON \
'name_of_your_table' \
x.json

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM