I have the following data
LastSubmission | PatientName | Phone | SubmissionId | HealthCondition
2020-12-17 16:02:56 UTC | a |123456789| abc123 | Good
2020-12-18 14:24:33 UTC | a |123456789| abc123 | Bad
2020-12-18 14:24:51 UTC | b |523456789| def321 | okay
2020-12-18 14:25:09 UTC | b |523456789| def321 | bad
2020-12-21 17:11:40 UTC | c |623456789| hij987 | better
2020-12-21 17:05:30 UTC | c |623456789| hij981 | worse
I want to write a query that returns only the latest data for each SubmissionId
Currently, I have the following code -
SELECT *
FROM `myproject.dataset.qualtrics`
WHERE LastSubmission =
(
SELECT MAX(LastSubmission),
FROM `myproject.dataset.qualtrics`
GROUP BY SubmissionID, LastSubmission
)
;
But when I run this I get an error saying 'Scalar subquery produced more than one element' Please help me solve this.
You want a correlated subquery:
SELECT q.*
FROM `myproject.dataset.qualtrics` q
WHERE LastSubmission = (SELECT MAX(q2.LastSubmission)
FROM `myproject.dataset.qualtrics` q2
WHERE q2.SubmissionID = q.SubmissionID
) ;
A more "bigquery"ish way to write the query would use aggregation:
select array_agg(q order by q.LastSubmission desc limit 1)[ordinal(1)].*
from `myproject.dataset.qualtrics` q
group by q.SubmissionID;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.