使用 BigQuery SQL 计算同一 ID 的所有列值的模式

Question

Let's say I have a Bigquery table with columns id , species , genre , and level .假设我有一个 Bigquery 表，其中包含id 、 species 、 genre和level列。 There are cases that for the same id , species , and genre my table could have different level values in multiple rows.在某些情况下，对于相同的id 、 species和genre ，我的表可能在多行中具有不同的level值。

Finally, I want to have 1 row per id with level value as mode of all the level values present in the original table for that id .最后，我希望每个id有 1 行，其level值作为该id原始表中存在的所有level值的mode 。

Example例子

#standardSQL
with `project.dataset.table` as (
  select '123' id, 'dog' species, 'suspense' genre, 3 level  union all 
  select '124', 'cat', 'love', 4 union all 
  select '123', 'dog', 'suspense', 5 union all
  select '123', 'dog', 'suspense', 5 
)
select *
from `project.dataset.table`

Expected Outcome: Same dataset with one row for each id.预期结果：相同的数据集，每个 id 一行。 For eg.例如。 in the above example, for id 123 level will be 5 (which occurred the most number of times)在上面的示例中，对于id 123，级别将为5 （出现次数最多）

How could I achieve this?我怎么能做到这一点？

[Update] The above data is just an example. [更新] 以上数据只是一个例子。 I have 20 million rows in my actual dataset with more than 4 columns.我的实际数据集中有 2000 万行，超过 4 列。

Answer 1

Try this:尝试这个：

with `project.dataset.table` as (
  select '123' id, 'dog' species, 'suspense' genre, 3 level  union all 
  select '124', 'cat', 'love', 4 union all 
  select '123', 'dog', 'suspense', 5 union all
  select '123', 'dog', 'suspense', 5 
)
select id, array_agg(level order by cnt desc limit 1)[offset(0)] as mode
from (
  select id, level, count(level) as cnt
  from `project.dataset.table`
  group by id, level
)
group by id

使用 BigQuery SQL 计算同一 ID 的所有列值的模式

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-06 13:31:17

使用 BigQuery SQL 计算同一 ID 的所有列值的模式

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-06 13:31:17

解决方案1
1 已采纳 2021-05-06 13:31:17