简体   繁体   中英

Why isn't GROUP_BY returning the desired records?

Given a table of:

invitations (id, created_at, type, email)

Given the following query:

SELECT DISTINCT email_address
FROM invitations
WHERE type = 'email'
AND date(created_at) in (curdate() - interval 3 day, 
                           curdate() - interval 8 day,
                           curdate() - interval 13 day,
                           curdate() - interval 21 day,
                           curdate() - interval 34 day,
                           curdate() - interval 50 day);

Given my query above, I get several hundred records. What I would like is to get the records grouped by email, meaning a unique email should exist at most once.

I therefore added:

GROUP BY email_address

This results in only 2 records when I run the query... Am I using GROUP BY incorrectly?

There's no point doing SELECT DISTINCT email ... GROUP BY email - use either the DISTINCT or the GROUP BY to achieve the same result

If you got two email addresses out as a result, then the set of records selected only contained two unique email addresses

GROUP BY is used to aggregate data where some combination of columns is the same. Any other columns outside this combination must be contained within an agregation function like AVG, MAX, SUM etc

An example:

SELECT city, ethnicity, MAX(age), AVG(salary)
FROM people
GROUP BY city, ethnicity

It will present a unique set of city/ethnicity pairs, together with the oldest age and the average pay for the group

DISTINCT is essentially less useful than a GROUP BY, as it doesn't let you run aggregation functions on other columns; it jsut returns a unique set of records across all selected columns. I consider it a convenience equivalent:

SELECT DISTINCT city, ethnicity
FROM people

SELECT city, ethnicity
FROM people
GROUP BY city, ethnicity --more to type

I'm assuming here that you want the records group-concatenated as such, or similar, based on your inclusion of the created_at field:

SELECT email, GROUP_CONCAT( DISTINCT created_at ORDER BY created_at DESC ) AS list_of_dates FROM invitations GROUP BY email;

to produce something along the lines of:

email              |  list_of_dates  
==========================================================  
jenny@hotmail.com  |  01/26/2019, 01/23/2019  
jonny@gmail.com    |  01/23/2019, 01/18/2019, 01/13/2019  
greg@yahoo.com     |  01/13/2019  

, or alternatively:

SELECT created_at, GROUP_CONCAT( DISTINCT email ORDER BY created_at DESC SEPARATOR '; ' ) AS list_of_dates FROM invitations GROUP BY created_at;

created_at  |  list_of_emails  
====================================  
01/26/2019  |  jenny@hotmail.com  
01/23/2019  |  jenny@hotmail.com; jonny@gmail.com  
01/18/2019  |  jonny@gmail.com  
01/13/2019  |  jonny@gmail.com; greg@yahoo.com  

In the context of GROUP_CONCAT(), a GROUP BY is optional and is used to further partition the groupings beyond a single-line aggregate. If no GROUP BY clause is specified, it'll concatenate everything to a single line.

  • The DISTINCT keyword inside the GROUP_CONCAT() must precede the concatenation field(s) and will give only the unique field values back
  • The optional ORDER BY follows the concatenation field(s) and works the same as in SELECT queries
  • SEPARATOR is also optional but must follow ORDER BY if also used; this allows you to specify the joining string used in concatenation

Depending on how many records/how long the concatenation might become, you can adjust the max value by setting the session variable:

SET SESSION group_concat_max_len=<max length>

Either way - use GROUP BY on whichever field you want to summarize and include the others inside GROUP_CONCAT, hopefully this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM