[英]Calculate mode of all column values for the same ID using BigQuery SQL
Let's say I have a Bigquery table with columns id
, species
, genre
, and level
.假设我有一个 Bigquery 表,其中包含
id
、 species
、 genre
和level
列。 There are cases that for the same id
, species
, and genre
my table could have different level
values in multiple rows.在某些情况下,对于相同的
id
、 species
和genre
,我的表可能在多行中具有不同的level
值。
Finally, I want to have 1 row per id
with level
value as mode
of all the level
values present in the original table for that id
.最后,我希望每个
id
有 1 行,其level
值作为该id
原始表中存在的所有level
值的mode
。
Example例子
#standardSQL
with `project.dataset.table` as (
select '123' id, 'dog' species, 'suspense' genre, 3 level union all
select '124', 'cat', 'love', 4 union all
select '123', 'dog', 'suspense', 5 union all
select '123', 'dog', 'suspense', 5
)
select *
from `project.dataset.table`
Expected Outcome: Same dataset with one row for each id.预期结果:相同的数据集,每个 id 一行。 For eg.
例如。 in the above example, for
id
123 level will be 5
(which occurred the most number of times)在上面的示例中,对于
id
123,级别将为5
(出现次数最多)
How could I achieve this?我怎么能做到这一点?
[Update] The above data is just an example. [更新] 以上数据只是一个例子。 I have 20 million rows in my actual dataset with more than 4 columns.
我的实际数据集中有 2000 万行,超过 4 列。
Try this:尝试这个:
with `project.dataset.table` as (
select '123' id, 'dog' species, 'suspense' genre, 3 level union all
select '124', 'cat', 'love', 4 union all
select '123', 'dog', 'suspense', 5 union all
select '123', 'dog', 'suspense', 5
)
select id, array_agg(level order by cnt desc limit 1)[offset(0)] as mode
from (
select id, level, count(level) as cnt
from `project.dataset.table`
group by id, level
)
group by id
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.