为A列的每个值选择N列B的最频繁值

Question

With a MySQL table like: 用如下的MySQL表：

id | colA | colB
...| 1    | 13
...| 1    | 13
...| 1    | 12
...| 1    | 12
...| 1    | 11
...| 2    | 78
...| 2    | 78
...| 2    | 78
...| 2    | 13
...| 2    | 13
...| 2    | 9

For each value in colA I want to find the N most frequent values in colB . 对于每个值colA我想找到N个最频繁的值colB 。

Example result for N=2: N = 2的示例结果：

colA | colB
1    | 13
1    | 12
2    | 78
2    | 13

I am able to get all unique combinations of colA and colB with their frequencies using: 我能够使用以下方法获得colA和colB所有唯一组合及其频率：

SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC;

Example result: 结果示例：

colA | colB | freq
1    | 13   | 2
1    | 12   | 2
1    | 11   | 1
2    | 78   | 3
2    | 13   | 2
2    | 9    | 1

But I struggle to apply a LIMIT for each value in colA instead of for the whole table. 但是我很难为colA每个值而不是整个表应用LIMIT 。

This is basically like How to select most frequent value in a column per each id group? 基本上就像如何在每个ID组的列中选择最频繁的值？ , just for MySQL instead of PostgreSQL. ，仅适用于MySQL而非PostgreSQL。

I am using MariaDB 10.1 at the moment. 我目前正在使用MariaDB 10.1。

Answer 1

Use window functions, if you can: 如果可以，请使用窗口功能：

SELECT colA, colB, freq
FROM (SELECT colA, colB, COUNT(*) AS freq,
             DENSE_RANK() OVER (PARTITION BY colA ORDER BY COUNT(*) DESC) as seqnum
      FROM t
      GROUP BY colA, colB 
     ) ab
WHERE seqnum <= 2;

Note that you might want DENSE_RANK() , RANK() or ROW_NUMBER() depending on how you want to treat ties. 请注意，根据您对待领带的方式，可能需要DENSE_RANK() ， RANK()或ROW_NUMBER() 。 If there are 5 colB values with the two highest ranks, then DENSE_RANK() will return all five. 如果有5个colB值具有最高的两个等级，则DENSE_RANK()将返回所有五个。

If you want exactly two values, then use ROW_NUMBER() . 如果只需要两个值，则使用ROW_NUMBER() 。

Answer 2

You can probably use a couple CTEs for this, something like: 您可能可以为此使用几个CTE，例如：

WITH counts AS (
   SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC
), most_freq AS (
   SELECT colA, max(freq) FROM counts GROUP BY colA
)
   SELECT counts.*
     FROM counts
     JOIN most_freq ON (counts.colA = most_freq.colA 
                        AND counts.freq = most_freq.freq);

为A列的每个值选择N列B的最频繁值

问题描述

2 个解决方案

解决方案1
1 2019-04-05 15:43:56

解决方案2
0 2019-04-05 15:38:49

为A列的每个值选择N列B的最频繁值

问题描述

2 个解决方案

解决方案1 1 2019-04-05 15:43:56

解决方案2 0 2019-04-05 15:38:49

解决方案1
1 2019-04-05 15:43:56

解决方案2
0 2019-04-05 15:38:49