簡體   English   中英

MySQL SELECT 按組最頻繁

[英]MySQL SELECT most frequent by group

如何獲取 MySQL 中每個標簽最常出現的類別? 理想情況下,我想模擬一個可以計算列模式的聚合函數。

SELECT 
  t.tag 
  , s.category 
FROM tags t 
LEFT JOIN stuff s 
USING (id) 
ORDER BY tag;

+------------------+----------+
| tag              | category |
+------------------+----------+
| automotive       |        8 |
| ba               |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |        8 |
| bamboo           |       10 |
| bamboo           |        8 |
| bamboo           |        9 |
| bamboo           |        8 |
| bamboo           |       10 |
| bamboo           |        8 |
| bamboo           |        9 |
| bamboo           |        8 |
| banana tree      |        8 |
| banana tree      |        8 |
| banana tree      |        8 |
| banana tree      |        8 |
| bath             |        9 |
+-----------------------------+
SELECT t1.*
FROM (SELECT tag, category, COUNT(*) AS count
      FROM tags INNER JOIN stuff USING (id)
      GROUP BY tag, category) t1
LEFT OUTER JOIN 
     (SELECT tag, category, COUNT(*) AS count
      FROM tags INNER JOIN stuff USING (id)
      GROUP BY tag, category) t2
  ON (t1.tag = t2.tag AND (t1.count < t2.count 
      OR t1.count = t2.count AND t1.category < t2.category))
WHERE t2.tag IS NULL
ORDER BY t1.count DESC;

我同意這對於單個 SQL 查詢來說太過分了。 任何在子查詢中使用GROUP BY都會讓我畏縮。 您可以使用視圖使其看起來更簡單:

CREATE VIEW count_per_category AS
    SELECT tag, category, COUNT(*) AS count
    FROM tags INNER JOIN stuff USING (id)
    GROUP BY tag, category;

SELECT t1.*
FROM count_per_category t1
LEFT OUTER JOIN count_per_category t2
  ON (t1.tag = t2.tag AND (t1.count < t2.count 
      OR t1.count = t2.count AND t1.category < t2.category))
WHERE t2.tag IS NULL
ORDER BY t1.count DESC;

但它基本上在幕后做同樣的工作。

您評論說您可以在應用程序代碼中輕松執行類似的操作。 那你為什么不這樣做呢? 執行更簡單的查詢以獲取每個類別的計數:

SELECT tag, category, COUNT(*) AS count
FROM tags INNER JOIN stuff USING (id)
GROUP BY tag, category;

並在應用程序代碼中對結果進行排序。

SELECT  tag, category
FROM    (
        SELECT  @tag <> tag AS _new,
                @tag := tag AS tag,
                category, COUNT(*) AS cnt
        FROM    (
                SELECT  @tag := ''
                ) vars,
                stuff
        GROUP BY
                tag, category
        ORDER BY
                tag, cnt DESC
        ) q
WHERE   _new

在您的數據上,這將返回以下內容:

'automotive',  8
'ba',          8
'bamboo',      8
'bananatree',  8
'bath',        9

這是測試腳本:

CREATE TABLE stuff (tag VARCHAR(20) NOT NULL, category INT NOT NULL);

INSERT
INTO    stuff
VALUES
('automotive',8),
('ba',8),
('bamboo',8),
('bamboo',8),
('bamboo',8),
('bamboo',8),
('bamboo',8),
('bamboo',10),
('bamboo',8),
('bamboo',9),
('bamboo',8),
('bamboo',10),
('bamboo',8),
('bamboo',9),
('bamboo',8),
('bananatree',8),
('bananatree',8),
('bananatree',8),
('bananatree',8),
('bath',9);

(編輯:在 ORDER BY 中忘記了 DESC)

在子查詢中使用 LIMIT 很容易。 MySQL 仍然有 no-LIMIT-in-subqueries 限制嗎? 下面的例子是使用 PostgreSQL。

=> select tag, (select category from stuff z where z.tag = s.tag group by tag, category order by count(*) DESC limit 1) AS category, (select count(*) from stuff z where z.tag = s.tag group by tag, category order by count(*) DESC limit 1) AS num_items from stuff s group by tag;
    tag     | category | num_items 
------------+----------+-----------
 ba         |        8 |         1
 automotive |        8 |         1
 bananatree |        8 |         4
 bath       |        9 |         1
 bamboo     |        8 |         9
(5 rows)

僅當您需要計數時才需要第三列。

這是針對更簡單的情況:

SELECT action, COUNT(action) AS ActionCount FROM log GROUP BY action ORDER BY ActionCount DESC;

這是一個 hacky 方法,它利用了max聚合函數,因為 MySQL(或窗口函數等)中沒有模式聚合函數允許這樣做:

SELECT  
  tag, 
  convert(substring(max(concat(lpad(c, 20, '0'), category)), 21), int) 
        AS most_frequent_category 
FROM (
    SELECT tag, category, count(*) AS c
    FROM tags INNER JOIN stuff using (id) 
    GROUP BY tag, category
) as grouped_cats 
GROUP BY tag;

基本上它利用了這樣一個事實,即我們可以找到每個單獨類別的計數的詞法最大值。

使用命名類別更容易看到這一點:

create temporary table tags (id int auto_increment primary key, tag character varying(20));
create temporary table stuff (id int, category character varying(20));
insert into tags (tag) values ('automotive'), ('ba'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('bamboo'), ('banana tree'), ('banana tree'), ('banana tree'), ('banana tree'), ('bath');
insert into stuff (id, category) values (1, 'cat-8'), (2, 'cat-8'), (3, 'cat-8'), (4, 'cat-8'), (5, 'cat-8'), (6, 'cat-8'), (7, 'cat-8'), (8, 'cat-10'), (9, 'cat-8'), (10, 'cat-9'), (11, 'cat-8'), (12, 'cat-10'), (13, 'cat-8'), (14, 'cat-9'), (15, 'cat-8'), (16, 'cat-8'), (17, 'cat-8'), (18, 'cat-8'), (19, 'cat-8'), (20, 'cat-9');

在這種情況下,我們不應該對most_frequent_category列進行整數轉換:

SELECT 
  tag, 
  substring(max(concat(lpad(c, 20, '0'), category)), 21) AS most_frequent_category 
FROM (
    SELECT tag, category, count(*) AS c
    FROM tags INNER JOIN stuff using (id) 
    GROUP BY tag, category
) as grouped_cats 
GROUP BY tag;

+-------------+------------------------+
| tag         | most_frequent_category |
+-------------+------------------------+
| automotive  | cat-8                  |
| ba          | cat-8                  |
| bamboo      | cat-8                  |
| banana tree | cat-8                  |
| bath        | cat-9                  |
+-------------+------------------------+

為了更深入地了解正在發生的事情,這里是grouped_cats內部選擇的樣子(我添加了order by tag, c desc ):

+-------------+----------+---+
| tag         | category | c |
+-------------+----------+---+
| automotive  | cat-8    | 1 |
| ba          | cat-8    | 1 |
| bamboo      | cat-8    | 9 |
| bamboo      | cat-10   | 2 |
| bamboo      | cat-9    | 2 |
| banana tree | cat-8    | 4 |
| bath        | cat-9    | 1 |
+-------------+----------+---+

如果我們省略substring位,我們可以看到count(*)列的最大值如何沿其關聯的類別拖動:

SELECT 
  tag, 
  max(concat(lpad(c, 20, '0'), category)) AS xmost_frequent_category
FROM (
    SELECT tag, category, count(*) AS c
    FROM tags INNER JOIN stuff using (id) 
    GROUP BY tag, category
) as grouped_cats 
GROUP BY tag;

+-------------+---------------------------+
| tag         | xmost_frequent_category   |
+-------------+---------------------------+
| automotive  | 00000000000000000001cat-8 |
| ba          | 00000000000000000001cat-8 |
| bamboo      | 00000000000000000009cat-8 |
| banana tree | 00000000000000000004cat-8 |
| bath        | 00000000000000000001cat-9 |
+-------------+---------------------------+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM