简体   繁体   中英

MySQL how to select first row each group with count

I have a table like this (simplified version):

+------+-------+-----+--------------+-----+
| id   | name  | age | company.name | ...   
+------+-------+-----+--------------------+
| 1    | Adam  | 21  |  Google      | ...                  
| 3    | Peter | 20  |  Apple       | ...                 
| 2    | Bob   | 20  |  Microsoft   | ...    
| 9    | Alice | 18  |  Google      | ... 
+------+-------+-----+--------------------+

I need groups data with counting rows by any one column. And I need to get first row in each group. User select which column will be used to group.

If user select column age to group then results:

+------+------------+-------+
| id   | group_name | count | 
+------+------------+-------+
| 9    | 18         |  1    |
+------+------------+-------+
| 2    | 20         |  2    |
+------+------------+-------+
| 1    | 21         |  1    |
+------+------------+-------+

Column to group may be numeric or string.

Currently I does it by this query:

SELECT id, group_name, users_name, count(id) as count FROM (
 SELECT persons.id as id, company.type as group_name, users.name as users_name 
 FROM persons  
 LEFT JOIN company on company.id = persons.company_id 
 LEFT JOIN position on position.id=persons.position_id 
 ...
 LEFT JOIN source on source.id=persons.source_id 
 WHERE ...  
 ORDER BY if(company.type = '' or company.type is null,1,0) ASC,
 company.type ASC, IF(persons.status = '' or persons.status is null,1,0) ASC, 
 persons.status ASC, persons.id
) t1 GROUP BY group_name

but with new version mysql this SQL stoped works I think that order is ignored in sub-select.

I know that similar topics was wroted, but proposed solutions not working with my query. I have to join many tables, add multiple conditions and use cascade order and then select first row from each group. I will be very happy if solution will be optimised for performace.

---- EDIT ----

Proposed solution: SQL select only rows with max value on a column

which suggest to use MAX() and GROUP BY not working well. For two reason

  1. If grouped column include string, then query return not first row, but last row in each group.
  2. If my dataset has a cascade order, I can not use MAX in a few columns at the same time.

I created sqlfiddle which include exact example.

http://sqlfiddle.com/#!9/23225d/11/0

-- EXAMPLE 1 - Group by string 
-- base query
SELECT persons.*, company.* FROM persons 
LEFT JOIN company ON persons.company_id = company.id
ORDER BY company.name ASC, company.id ASC;

--   grouping query
SELECT MAX(persons.id) as id, company.name, count(persons.id) as count
FROM persons
LEFT JOIN company ON persons.company_id = company.id
GROUP BY company.name
ORDER BY company.name ASC, persons.id ASC;

-- The results will be: 
-- |ID | NAME     | COUNT|
-- |1  | Google   | 2    |
-- |3  | Microsoft| 3    |

-- EXAMPLE 2 - Cascade order
-- base query
SELECT persons.*, company.* FROM persons 
LEFT JOIN company ON persons.company_id = company.id
ORDER BY company.type ASC, persons.status ASC;

--  grouping query 
SELECT MAX(persons.id) as id, company.type, count(persons.id) as count
FROM persons
LEFT JOIN company ON persons.company_id = company.id
GROUP BY company.type
ORDER BY company.type ASC, persons.status ASC;

-- The results will be: 
-- |ID | NAME| COUNT|
-- |3  |  1  |   2  |
-- |2  |  2  |   3  |

Just change MAX() to MIN() to get the first row instead of the last row in each group.

To get the extreme values of cascading columns, see SQL : Using GROUP BY and MAX on multiple columns . Use that in the subquery part of the query to get the row containing those extremes, as in SQL select only rows with max value on a column .

So the form of the full query is:

SELECT t1.id, t1.grouped_column, t2.count
FROM yourTable AS t
JOIN (SELECT t3.grouped_column, t3.order_column1, MIN(t4.order_column2) AS order_column2, SUM(t3.count) AS count
      FROM (SELECT grouped_column, MIN(order_column1) AS order_column1, COUNT(*) AS count
            FROM yourTable
            GROUP BY grouped_column) AS t3
      JOIN yourTable AS t4 
      ON t3.grouped_column = t4.grouped_column AND t3.order_column1 = t4.order_column1
      GROUP BY t4.grouped_column, t4.order_column1) AS t2
ON t1.grouped_column = t2.grouped_column AND t1.ordered_column1 = t2.order_column1 AND t1.order_column2 = t2.order_column2

Since you want to operate on a join, I suggest you define a view that uses the join. Then you can use that view in place of yourTable in the above query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM