简体   繁体   English

BigQuery-不平等与平等联接-maximumBillingTier

[英]BigQuery - Inequality vs Equality Joins - maximumBillingTier

Here is the inequality condition that I have in my join (simple overlap conditions): 这是联接中存在的不平等条件(简单的重叠条件):

ON
(A.start <= B.End) AND (B.Start <= A.END)

It gives me the following error: 它给了我以下错误:

java.lang.RuntimeException: BigQueryError{reason=billingTierLimitExceeded, location=null, message=Query exceeded resource limits. java.lang.RuntimeException:BigQueryError {reason = billingTierLimitExceeded,location = null,message = Query超出资源限制。 700920.3330645757 CPU seconds were used, and this query must use less than 529900.0 CPU seconds. 使用了700920.3330645757 CPU秒,并且此查询必须使用少于529900.0 CPU秒。

Surprisingly, this operation takes more than running the sequential algorithm (w/o any join) on a single instance (n1-highmem-16). 令人惊讶的是,此操作比在单个实例(n1-highmem-16)上运行顺序算法(不包括任何联接)所花费的时间更多。

I have a couple of questions: 我有一些问题:

1) How can I calculate maximumBillingTier for my query?

2) Can someone explain how inequality joins work in BigQuery?

3) Why inequality joins are so expensive? 
Is it because of number of operations, or is it because of large number of outputs?

For the same query and input tables, inequality joins takes more than 13000 seconds and eventually gets canceled due to time-out, but if I change the condition to only cover equality, it would take only 70 secs. 对于相同的查询和输入表,不等式联接花费了超过13000秒,最终由于超时而被取消,但是如果我将条件更改为仅覆盖相等性,则只用了70秒。

Thanks! 谢谢!

1) How can I calculate maximumBillingTier for my query? 1)如何计算查询的maximumBillingTier?

I think this goes down to the notion of Slots 我认为这可以归结为“ Slots的概念

A BigQuery slot is a unit of computational capacity required to execute SQL queries. BigQuery插槽是执行SQL查询所需的计算能力的单位。 BigQuery automatically calculates how many slots are required by each query, depending on query size and complexity. BigQuery会根据查询的大小和复杂程度自动计算每个查询需要多少个广告位。

The default number of slots for on-demand queries is shared among all queries in a single project. 按需查询的默认插槽数在单个项目中的所有查询之间共享。 As a rule, if you're processing less than 100 GB of queries at once, you're unlikely to be using all 2,000 slots. 通常,如果一次处理少于100 GB的查询,则不可能使用全部2,000个插槽。

To check how many slots you're using, see Monitoring BigQuery Using Stackdriver . 要检查您正在使用多少个插槽,请参阅Monitoring BigQuery Using Stackdriver

See more details at Query Jobs Quotas 在“ Query Jobs Quotas查看更多详细信息

2) Can someone explain how inequality joins work in BigQuery? 2)有人可以解释不平等联接如何在BigQuery中起作用吗?

This can really depends on data size and distribution 这实际上取决于数据的大小和分布
I would recommend Query Plan Explanation - it can help not only in understanding what is going on under-hood but also will help you to optimize your query 我建议使用“ Query Plan Explanation -它不仅可以帮助您了解幕后情况,还可以帮助您optimize your query

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM