[英]How do you calculate a boolean aggregate over a column in BigQuery?
I have a table of events of users, and I want to project those events into a new column with some predicate, and then aggregate the events together per user into a new projection that tells me if a user has ever had the predicate match for them, or if they've never had it match, etc. 我有一个用户事件表,我想将这些事件投影到带有一些谓词的新列中,然后将每个用户的事件聚合到一个新的投影中,告诉我用户是否曾经有过谓词匹配,或者如果他们从来没有匹配,等等
In other languages this is usually called all()
and any()
, where you pass it a list of boolean values and it will tell you if all of them match, or if at least one matches. 在其他语言中,这通常称为all()
和any()
,在其中传递一个布尔值列表,它将告诉您它们是否匹配,或者是否至少匹配一个。 It's equivalent to using a boolean AND
on all boolean values (such as in the case with all
) or using a boolean OR
on all boolean values (as in any
). 这相当于用一个布尔AND
所有布尔值(如与本案all
),或使用布尔OR
上的所有布尔值(如any
)。
Does BigQuery have this feature? BigQuery有这个功能吗? I can sort of approximate it using max
and min
but it's not ideal. 我可以使用max
和min
来近似它,但它并不理想。
Example: 例:
select
month(date_time) m,
count(*) as ct,
max(id_is_present),
min(id_is_present),
max(starts_with_one) max_one,
min(starts_with_one) min_one,
from
(
select
length(user_id) > 1 id_is_present,
regexp_match(user_id, r'^1') starts_with_one,
date_time
from
[user_events.2015_02]
)
group by
m
It's exploiting a behavior of max(true, false, false)
yielding true
, so you could sort of implement any
and all
by searching through the column for values and then building from there. 它正在利用max(true, false, false)
产生true
,因此您可以通过在列中搜索值然后从那里构建来实现any
和all
。
Is this the hack I have to rely on or does BigQuery support boolean aggregates? 这是我必须依靠的技巧,还是BigQuery支持布尔聚合?
Yes, BigQuery has such aggregation functions, it uses SQL Standard names for them: 是的,BigQuery有这样的聚合函数,它使用SQL标准名称:
EVERY (will do logical and)
SOME (will do logical or)
In case someone else stumbles across this, standard SQL offers logical_and()
and logical_or
. 如果其他人偶然发现这种情况,标准SQL提供logical_and()
和logical_or
。 So, the code could be written as: 因此,代码可以写成:
select month(date_time) as m, count(*) as ct,
logical_or(id_is_present),
logical_and(id_is_present),
logical_or(starts_with_one) as max_one,
logical_and(starts_with_one) min_one,
from (select length(user_id) > 1 id_is_present,
regexp_match(user_id, r'^1') starts_with_one,
date_time
from [user_events.2015_02]
) u
group by m;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.