简体   繁体   English

如何计算BigQuery中列的布尔聚合?

[英]How do you calculate a boolean aggregate over a column in BigQuery?

I have a table of events of users, and I want to project those events into a new column with some predicate, and then aggregate the events together per user into a new projection that tells me if a user has ever had the predicate match for them, or if they've never had it match, etc. 我有一个用户事件表,我想将这些事件投影到带有一些谓词的新列中,然后将每个用户的事件聚合到一个新的投影中,告诉我用户是否曾经有过谓词匹配,或者如果他们从来没有匹配,等等

In other languages this is usually called all() and any() , where you pass it a list of boolean values and it will tell you if all of them match, or if at least one matches. 在其他语言中,这通常称为all()any() ,在其中传递一个布尔值列表,它将告诉您它们是否匹配,或者是否至少匹配一个。 It's equivalent to using a boolean AND on all boolean values (such as in the case with all ) or using a boolean OR on all boolean values (as in any ). 这相当于用一个布尔AND所有布尔值(如与本案all ),或使用布尔OR上的所有布尔值(如any )。

Does BigQuery have this feature? BigQuery有这个功能吗? I can sort of approximate it using max and min but it's not ideal. 我可以使用maxmin来近似它,但它并不理想。

Example: 例:

select
month(date_time) m,
count(*) as ct,
max(id_is_present),
min(id_is_present),
max(starts_with_one) max_one,
min(starts_with_one) min_one,
from
(
    select
    length(user_id) > 1 id_is_present,
    regexp_match(user_id, r'^1') starts_with_one,
    date_time
    from
    [user_events.2015_02]
)
group by
m

It's exploiting a behavior of max(true, false, false) yielding true , so you could sort of implement any and all by searching through the column for values and then building from there. 它正在利用max(true, false, false)产生true ,因此您可以通过在列中搜索值然后从那里构建来实现anyall

Is this the hack I have to rely on or does BigQuery support boolean aggregates? 这是我必须依靠的技巧,还是BigQuery支持布尔聚合?

Yes, BigQuery has such aggregation functions, it uses SQL Standard names for them: 是的,BigQuery有这样的聚合函数,它使用SQL标准名称:

EVERY (will do logical and)
SOME (will do logical or)

In case someone else stumbles across this, standard SQL offers logical_and() and logical_or . 如果其他人偶然发现这种情况,标准SQL提供logical_and()logical_or So, the code could be written as: 因此,代码可以写成:

select month(date_time) as m, count(*) as ct,
       logical_or(id_is_present),
       logical_and(id_is_present),
       logical_or(starts_with_one) as max_one,
       logical_and(starts_with_one) min_one,
from (select length(user_id) > 1 id_is_present,
             regexp_match(user_id, r'^1') starts_with_one,
             date_time
      from [user_events.2015_02]
      ) u
group by m;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM