简体   繁体   English

使用 BigQuery SQL 计算同一 ID 的所有列值的模式

[英]Calculate mode of all column values for the same ID using BigQuery SQL

Let's say I have a Bigquery table with columns id , species , genre , and level .假设我有一个 Bigquery 表,其中包含idspeciesgenrelevel列。 There are cases that for the same id , species , and genre my table could have different level values in multiple rows.在某些情况下,对于相同的idspeciesgenre ,我的表可能在多行中具有不同的level值。

Finally, I want to have 1 row per id with level value as mode of all the level values present in the original table for that id .最后,我希望每个id有 1 行,其level值作为该id原始表中存在的所有level值的mode

Example例子

#standardSQL
with `project.dataset.table` as (
  select '123' id, 'dog' species, 'suspense' genre, 3 level  union all 
  select '124', 'cat', 'love', 4 union all 
  select '123', 'dog', 'suspense', 5 union all
  select '123', 'dog', 'suspense', 5 
)
select *
from `project.dataset.table`

Expected Outcome: Same dataset with one row for each id.预期结果:相同的数据集,每个 id 一行。 For eg.例如。 in the above example, for id 123 level will be 5 (which occurred the most number of times)在上面的示例中,对于id 123,级别将为5 (出现次数最多)

How could I achieve this?我怎么能做到这一点?

[Update] The above data is just an example. [更新] 以上数据只是一个例子。 I have 20 million rows in my actual dataset with more than 4 columns.我的实际数据集中有 2000 万行,超过 4 列。

Try this:尝试这个:

with `project.dataset.table` as (
  select '123' id, 'dog' species, 'suspense' genre, 3 level  union all 
  select '124', 'cat', 'love', 4 union all 
  select '123', 'dog', 'suspense', 5 union all
  select '123', 'dog', 'suspense', 5 
)
select id, array_agg(level order by cnt desc limit 1)[offset(0)] as mode
from (
  select id, level, count(level) as cnt
  from `project.dataset.table`
  group by id, level
)
group by id

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 SQL-将结果列中的所有值相加,并使用同一查询中的另一列计算百分比 - SQL- Add up all the values in a resulting column and calculate the percentage using another column in the same query 如果已定义B列中A列中某个ID的值,请更新B列中该ID的所有值(Bigquery SQL) - If value in column B for a certain ID in column A has been defined, update all values for this ID in column B (Bigquery SQL) 如何计算 BigQuery 中数组列的所有值的平均值和中位数? - How to calculate average and median of all the values of an array column in BigQuery? 使用 Javascript 从 BigQuery 中的上一行和同一列计算 - calculate from previous row in BigQuery and same column using Javascript BigQuery SQL中的数组列和聚合:为什么值不是全部聚合的? - Array column and aggregation in BigQuery SQL: Why the values are not all aggregated? 对列中具有相同 ID 的所有值求和会在 SQL 中给我重复值吗? - Summing all values with same ID in a column give me duplicated values in SQL? 将一列中具有相同ID的所有值与SQL Server中的逗号分隔值合并 - Combine all values in one column with the same ID with Comma Separated values in SQL Server SQL - 如何 select 行具有相同的 ID 值,其中所有其他列值也相同 - SQL - How to select rows with the same ID values where all other column values are also identical SQL:获取具有相同ID的所有列记录 - SQL: get all column records with same id SQL 获取 id 1 列的平均值,按 id 2 同一列中的值分组 - SQL get avg of column for id 1, grouped by values in the same column for id 2
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM