简体   繁体   English

什么是 BigQuery DML 配额限制

[英]What is BigQuery DML Quota restriction

I was under impression that BigQuery DML has no longer INERT restriction, however, one of my ingestion workflows that runs INSERT DML uniformly distributed across the day, on average 80 DML every 2 min (~35-70B records transformations and aggregation per day) failed occasionally.我的印象是 BigQuery DML 不再有 INERT 限制,但是,我的一个摄取工作流在一天中均匀分布地运行 INSERT DML,平均每 2 分钟 80 DML(每天约 35-70B 记录转换和聚合)失败偶尔。

What's strange I am seeing only a few INSERT DML queries failed and only for some hours.奇怪的是,我看到只有几个 INSERT DML 查询失败了,而且只持续了几个小时。 When looking at pending jobs, around when errors took place there are no more than a few queries job in the pending state. All DML queries use the reservation.查看挂起的作业时,大约在发生错误时,挂起的 state 中只有几个查询作业。所有 DML 查询都使用保留。

I wonder what is and how the quota is computed, besides that documentation claims there is no DML INSERT quota.我想知道配额是什么以及如何计算,除了该文档声称没有 DML INSERT 配额。

Here is the error I am seeing.这是我看到的错误。 {"location":"max_dml_outstanding_per_table","message":"Quota exceeded: Your table exceeded quota for total number of dml jobs writing to a table, pending + running. For more information, see https://cloud.google.com/bigquery/troubleshooting-errors","reason":"quotaExceeded"} {"location":"max_dml_outstanding_per_table","message":"Quota exceeded: 你的表超出了写入表的 dml 作业总数的配额,待处理 + 运行。有关更多信息,请参阅https://cloud.google.com /bigquery/troubleshooting-errors","reason":"quotaExceeded"}

To better understand my case, I am using bqtail ingestion workflow defined as:为了更好地理解我的案例,我使用的 bqtail摄取工作流定义为:

When:
  Prefix: "/xxxxx/xxxx/"
  Suffix: ".gz"
Async: true
Batch:
  MultiPath: true
  Window:
    DurationInSec: 120

Dest:
  Pattern: '.+/(\d{4})/(\d{2})/(\d{2})/.+'
  Table: myproject.selector.selection_$Mod(80)_$1$2$3
  SourceFormat: NEWLINE_DELIMITED_JSON
  Transient:
    Dataset: temp
    Balancer:
      MaxLoadJobs: 100
      ProjectIDs:
        - myproject-transient1
        - myproject-transient2
        - myproject-transient3
        - myproject-transient4
        - myproject-transient5
  Schema:
    Template: myproject.selector.selection_tmpl
  SchemaUpdateOptions:
    - ALLOW_FIELD_ADDITION
  WriteDisposition: WRITE_APPEND

OnSuccess:
  - Action: query
    Request:
      SQL: INSERT INTO `myproject.selector.xxx_agg1`( ....) SELECT ... FROM $TempTable GROUP BY x, y, z
    OnSuccess:
      - Action: query
        Request:
          SQL: INSERT INTO `myproject.selector.xxx_agg2`( ....) SELECT ... FROM $TempTable GROUP BY x1, y1, z1
        OnSuccess:
          - Action: delete

In summary: every two minutes data files are batched upto 80 load requests into transient tables and then copied to the final destination tables: myproject.selector.selection_$Mod(80)_$1$2$3 (80 different tables suffixed by date).总结:每两分钟数据文件被批处理最多 80 个加载请求到临时表中,然后复制到最终目标表:myproject.selector.selection_$Mod(80)_$1$2$3(80 个不同的表以日期为后缀)。

Ingestion work is distributed between 5 transient projects, the typical batch would be up to 1M records, each load job taking around 22 sec and copy job taking 1 sec.摄取工作分布在 5 个瞬态项目之间,典型批处理最多 1M 条记录,每个加载作业大约需要 22 秒,复制作业需要 1 秒。 After a successful copy, the first DML executes, then the second.复制成功后,先执行第一个 DML,然后执行第二个。

As per GCP support team.根据 GCP 支持团队。 The error message indicated that DML jobs were hitting DML_ALL_JOBS_CONCURRENT limit.错误消息表明 DML 作业正在达到 DML_ALL_JOBS_CONCURRENT 限制。

DML_ALL_JOBS_CONCURRENT is not a numbered limit itself, it just gets triggered when either the INSERT concurrency limit or the UPDATE/DELETE/MERGE concurrency limit has been reached. DML_ALL_JOBS_CONCURRENT 本身不是编号限制,它只会在达到 INSERT 并发限制或 UPDATE/DELETE/MERGE 并发限制时触发。

DML INSERT allows up to 100 concurrent jobs (pending+running) per table. DML INSERT 允许每个表最多 100 个并发作业(待处理+运行)。 DML MERGE/UPDATE/DELTE up 20 concurrent jobs (pending+running) per table. DML MERGE/UPDATE/DELTE 每个表最多 20 个并发作业(待处理+运行)。

This quota error means you're submitting jobs faster than they can be finished by BigQuery.此配额错误意味着您提交作业的速度快于 BigQuery 完成作业的速度。 BigQuery can only run a certain number of DML jobs concurrently on a table (running jobs). BigQuery 只能在一个表上并发运行一定数量的 DML 作业(running jobs)。 When jobs are received after this limit is exceeded, the jobs are put into a queue to wait for execution (pending jobs).当超过此限制后收到作业时,作业将被放入队列中等待执行(待处理作业)。 When limit of this queue is also exceeded, you receive this quota error, "Your table exceeded quota for total number of dml jobs writing to a table, pending + running".当此队列的限制也被超过时,您会收到此配额错误,“您的表超出了写入表的 dml 作业总数的配额,挂起 + 运行”。

Retry the submission of these jobs, with exponential backoff, should help in this situation.在这种情况下,使用指数退避重试这些作业的提交应该会有所帮助。

The streaming API is another option for frequent, small appends to the table.流式传输 API 是对表进行频繁、小的追加的另一种选择。 It allows for a much higher QPS.它允许更高的 QPS。

Thanks to your comment, you should reach one limit.感谢您的评论,您应该达到一个限制。 When you look at the quotas documentation for DML you can read this当您查看DML 的配额文档时,您可以阅读此内容

BigQuery DML statements have no quota limits. BigQuery DML 语句没有配额限制。

However, DML statements are counted toward the maximum number of table operations per day and partition modifications per day.但是,DML 语句计入每天的最大表操作数和每天的分区修改数。 DML statements will not fail due to these limits. DML 语句不会因这些限制而失败。

In addition, DML statements are subject to the maximum rate of table metadata update operations.此外,DML 语句受制于表元数据更新操作的最大速率。 If you exceed this limit, retry the operation using exponential backoff between retries.如果超过此限制,请在重试之间使用指数退避重试该操作。

If you follow the latest link, on the table metadata update operation , you have this如果您点击最新链接,在表元数据更新操作上,您将拥有此

Maximum rate of table metadata update operations — 5 operations every 10 seconds per table表元数据更新操作的最大速率 - 每个表每 10 秒 5 次操作

The table metadata update limit includes all metadata update operations performed by using the Cloud Console, the classic BigQuery web UI, the bq command-line tool, the client libraries, by calling the tables.insert, tables.patch, or tables.update API methods, or executing ALTER TABLE DDL statements.表元数据更新限制包括使用 Cloud Console、经典 BigQuery web UI、bq 命令行工具、客户端库通过调用 tables.insert、tables.patch 或 tables.update 执行的所有元数据更新操作 API方法,或执行 ALTER TABLE DDL 语句。 This limit also applies to job output.此限制也适用于职位 output。


So, in summary, you perform more than 5 tables insert tables.insert over 10 secondes and you have an issue with an retry exponential backoff.因此,总而言之,您执行了 5 个以上的表 insert tables.insert超过 10 秒,并且您遇到了重试指数退避的问题。

Sometime the quota is tolerant, sometime it reject you.有时配额是宽容的,有时它会拒绝你。 This can depend on the global platform status.这可能取决于全球平台状态。

Try to insert is only 1 request in the same table (you can use temp table and then perform a global insert query over all these temp tables)尝试在同一个表中仅插入 1 个请求(您可以使用临时表,然后对所有这些临时表执行全局插入查询)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM