简体   繁体   English

如何为每个 id 组选择列中最常见的值?

[英]How to select most frequent value in a column per each id group?

I have a table in SQL that looks like this:我在 SQL 中有一个表,如下所示:

user_id | data1
0       | 6
0       | 6
0       | 6
0       | 1
0       | 1
0       | 2
1       | 5
1       | 5
1       | 3
1       | 3
1       | 3
1       | 7

I want to write a query that returns two columns: a column for the user id, and a column for what the most frequently occurring value per id is.我想编写一个返回两列的查询:一列用于用户 ID,一列用于表示每个 id 最常出现的值是什么。 In my example, for user_id 0, the most frequent value is 6, and for user_id 1, the most frequent value is 3. I would want it to look like below:在我的示例中,对于 user_id 0,最频繁的值是 6,对于 user_id 1,最频繁的值是 3。我希望它看起来像下面这样:

user_id | most_frequent_value
0       | 6
1       | 3

I am using the query below to get the most frequent value, but it runs against the whole table and returns the most common value for the whole table instead of for each id.我使用下面的查询来获取最频繁的值,但它针对整个表运行并返回整个表而不是每个 id 的最常见值。 What would I need to add to my query to get it to return the most frequent value for each id?我需要在我的查询中添加什么才能让它为每个 id 返回最频繁的值? I am thinking I need to use a subquery, but am unsure of how to structure it.我想我需要使用子查询,但不确定如何构造它。

SELECT user_id, data1 AS most_frequent_value
FROM my_table
GROUP BY user_id, data1
ORDER BY COUNT(*) DESC LIMIT 1

You can use a window function to rank the userids based on their count of data1.您可以使用窗口函数根据用户的 data1 计数对用户 ID 进行排名。

WITH cte AS (
SELECT 
    user_id 
  , data1
  , ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY COUNT(data1) DESC) rn
FROM dbo.YourTable
GROUP BY
  user_id,
  data1)

SELECT
    user_id,
    data1
FROM cte WHERE rn = 1 

If you use proper "order by" then distinct on (user_id) make the same work because it takes 1.line from data partitioned by "user_id".如果您使用正确的“order by”,则distinct on (user_id)因为它从按“user_id”分区的数据中获取 1.line 行。 DISTINCT ON is specialty of PostgreSQL. DISTINCT ON是 PostgreSQL 的特长。

select distinct on (user_id) user_id, most_frequent_value from (
SELECT user_id, data1 AS most_frequent_value, count(*) as _count
FROM my_table
GROUP BY user_id, data1) a
ORDER BY user_id, _count DESC 

With postgres 9.4 or greater it is possible.使用 postgres 9.4或更高版本是可能的。 You can use it like:你可以像这样使用它:

SELECT 
    user_id, MODE() WITHIN GROUP (ORDER BY value) 
FROM  
    (VALUES (0,6), (0,6), (0, 6), (0,1),(0,1), (1,5), (1,5), (1,3), (1,3), (1,7)) 
    users (user_id, value)
GROUP BY user_id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM