[英]Select N most frequent values of column B for each value of column A
用如下的MySQL表:
id | colA | colB
...| 1 | 13
...| 1 | 13
...| 1 | 12
...| 1 | 12
...| 1 | 11
...| 2 | 78
...| 2 | 78
...| 2 | 78
...| 2 | 13
...| 2 | 13
...| 2 | 9
对于每个值colA
我想找到N个最频繁的值colB
。
N = 2的示例结果:
colA | colB
1 | 13
1 | 12
2 | 78
2 | 13
我能够使用以下方法获得colA
和colB
所有唯一组合及其频率:
SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC;
结果示例:
colA | colB | freq
1 | 13 | 2
1 | 12 | 2
1 | 11 | 1
2 | 78 | 3
2 | 13 | 2
2 | 9 | 1
但是我很难为colA
每个值而不是整个表应用LIMIT
。
基本上就像如何在每个ID组的列中选择最频繁的值? ,仅适用于MySQL而非PostgreSQL。
我目前正在使用MariaDB 10.1。
如果可以,请使用窗口功能:
SELECT colA, colB, freq
FROM (SELECT colA, colB, COUNT(*) AS freq,
DENSE_RANK() OVER (PARTITION BY colA ORDER BY COUNT(*) DESC) as seqnum
FROM t
GROUP BY colA, colB
) ab
WHERE seqnum <= 2;
请注意,根据您对待领带的方式,可能需要DENSE_RANK()
, RANK()
或ROW_NUMBER()
。 如果有5个colB
值具有最高的两个等级,则DENSE_RANK()
将返回所有五个。
如果只需要两个值,则使用ROW_NUMBER()
。
您可能可以为此使用几个CTE,例如:
WITH counts AS (
SELECT colA, colB, COUNT(*) AS freq FROM t GROUP BY colA, colB ORDER BY freq DESC
), most_freq AS (
SELECT colA, max(freq) FROM counts GROUP BY colA
)
SELECT counts.*
FROM counts
JOIN most_freq ON (counts.colA = most_freq.colA
AND counts.freq = most_freq.freq);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.