Given the following data set, how would I find the email addresses that were references for the most ApplicationID
s that have an "Accepted" decision?
CREATE TABLE IF NOT EXISTS `EmailReferences` (
`ApplicationID` INT NOT NULL,
`Email` VARCHAR(45) NOT NULL,
PRIMARY KEY (`ApplicationID`, `Email`)
);
INSERT INTO EmailReferences (ApplicationID, Email)
VALUES
(1, 'ref10@test.org'), (1, 'ref11@test.org'), (1, 'ref12@test.org'),
(2, 'ref20@test.org'), (2, 'ref21@test.org'), (2, 'ref22@test.org'),
(3, 'ref11@test.org'), (3, 'ref31@test.org'), (3, 'ref32@test.org'),
(4, 'ref40@test.org'), (4, 'ref41@test.org'), (4, 'ref42@test.org'),
(5, 'ref50@test.org'), (5, 'ref51@test.org'), (5, 'ref52@test.org'),
(6, 'ref60@test.org'), (6, 'ref11@test.org'), (6, 'ref62@test.org'),
(7, 'ref70@test.org'), (7, 'ref71@test.org'), (7, 'ref72@test.org'),
(8, 'ref10@test.org'), (8, 'ref81@test.org'), (8, 'ref82@test.org')
;
CREATE TABLE IF NOT EXISTS `FinalDecision` (
`ApplicationID` INT NOT NULL,
`Decision` ENUM('Accepted', 'Denied') NOT NULL,
PRIMARY KEY (`ApplicationID`)
);
INSERT INTO FinalDecision (ApplicationID, Decision)
VALUES
(1, 'Accepted'), (2, 'Denied'),
(3, 'Accepted'), (4, 'Denied'),
(5, 'Denied'), (6, 'Denied'),
(7, 'Denied'), (8, 'Accepted')
;
Fiddle of same: http://sqlfiddle.com/#!9/03bcf2/1
Initially, I was using LIMIT 1
and ORDER BY CountDecision DESC
, like so:
SELECT er.email, COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
ORDER BY CountDecision DESC
LIMIT 1
;
However, it occurred to me that I could have multiple email addresses that referred different "most accepted" decisions (ie, a tie, so to speak), and those would be filtered out (is that the right phrasing?) with the LIMIT
keyword.
I then tried a variation on the above query, replacing the ORDER BY
and LIMIT
lines with:
HAVING MAX(CountDecision)
But I realized that that's only half a statement: MAX(CountDecision)
needs to be compared to something. I just don't know what.
Any pointers would be much appreciated. Thanks!
Note: this is for a homework assignment.
Update: To be clear, I'm trying to find value and count of Email
s from EmailReferences
. However, I only want rows that have FinalDecision.Decision = 'Accepted'
(on matching ApplicantID
s). Based on my data, the result should be:
Email | CountDecision
---------------+--------------
ref10@test.org | 2
ref11@test.org | 2
Basically you need to do 2 things... first, you need to find what is the maxCount and then find the records with max count.
Now you can combine these two steps in a single nested query, or store the result in a variable and use it in a second query. Personally I try to avoid inner queries as they cause performance issue and are more complex to read, therefore I am using the variable option here:
-- Find out what max count is and store it in a variable
SELECT @maxcount := COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
ORDER BY CountDecision desc
Limit 1;
-- get emails with @maxcount
SELECT er.Email, COUNT(fd.Decision) AS CountDecision
FROM EmailReferences AS er
JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID
WHERE fd.Decision = 'Accepted'
GROUP BY er.email
HAVING COUNT(fd.Decision) = @maxcount;
MySQL still lack window functions, but when version 8 is production ready, this becomes easier. So for fuure reference, or for those databases like Mariadb that already have window functions:
CREATE TABLE IF NOT EXISTS `EmailReferences` ( `ApplicationID` INT NOT NULL, `Email` VARCHAR(45) NOT NULL, PRIMARY KEY (`ApplicationID`, `Email`) );
INSERT INTO EmailReferences (ApplicationID, Email) VALUES (1, 'ref10@test.org'), (1, 'ref11@test.org'), (1, 'ref12@test.org'), (2, 'ref20@test.org'), (2, 'ref21@test.org'), (2, 'ref22@test.org'), (3, 'ref30@test.org'), (3, 'ref31@test.org'), (3, 'ref32@test.org'), (4, 'ref40@test.org'), (4, 'ref41@test.org'), (4, 'ref42@test.org'), (5, 'ref50@test.org'), (5, 'ref51@test.org'), (5, 'ref52@test.org'), (6, 'ref60@test.org'), (6, 'ref11@test.org'), (6, 'ref62@test.org'), (7, 'ref70@test.org'), (7, 'ref71@test.org'), (7, 'ref72@test.org'), (8, 'ref10@test.org'), (8, 'ref81@test.org'), (8, 'ref82@test.org') ;
CREATE TABLE IF NOT EXISTS `FinalDecision` ( `ApplicationID` INT NOT NULL, `Decision` ENUM('Accepted', 'Denied') NOT NULL, PRIMARY KEY (`ApplicationID`) );
INSERT INTO FinalDecision (ApplicationID, Decision) VALUES (1, 'Accepted'), (2, 'Denied'), (3, 'Accepted'), (4, 'Denied'), (5, 'Denied'), (6, 'Denied'), (7, 'Denied'), (8, 'Accepted') ;
select email, CountDecision from ( SELECT er.email, COUNT(fd.Decision) AS CountDecision , max(COUNT(fd.Decision)) over() maxCountDecision FROM EmailReferences AS er JOIN FinalDecision AS fd ON er.ApplicationID = fd.ApplicationID WHERE fd.Decision = 'Accepted' GROUP BY er.email ) d where CountDecision = maxCountDecision
\nemail | CountDecision \n:------------- | ------------: \nref10@test.org | 2 \n
dbfiddle here
For example...
SELECT a.*
FROM
( SELECT x.email
, COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
) a
JOIN
( SELECT COUNT(*) total
FROM emailreferences x
JOIN finaldecision y
ON y.applicationid = x.applicationid
WHERE y.decision = 'accepted'
GROUP
BY x.email
ORDER
BY total DESC
LIMIT 1
) b
ON b.total = a.total;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.