BigQuery - Scalar subquery produced more than one element

Question

I have the following data

LastSubmission          | PatientName | Phone   | SubmissionId | HealthCondition
2020-12-17 16:02:56 UTC |  a          |123456789| abc123       | Good
2020-12-18 14:24:33 UTC |  a          |123456789| abc123       | Bad
2020-12-18 14:24:51 UTC |  b          |523456789| def321       | okay
2020-12-18 14:25:09 UTC |  b          |523456789| def321       | bad
2020-12-21 17:11:40 UTC |  c          |623456789| hij987       | better
2020-12-21 17:05:30 UTC |  c          |623456789| hij981       | worse

I want to write a query that returns only the latest data for each SubmissionId

Currently, I have the following code -

SELECT *
FROM `myproject.dataset.qualtrics`
WHERE LastSubmission = 
( 
SELECT MAX(LastSubmission), 
FROM `myproject.dataset.qualtrics` 
GROUP BY SubmissionID, LastSubmission
) 
;

But when I run this I get an error saying 'Scalar subquery produced more than one element' Please help me solve this.

Answer 1

You want a correlated subquery:

SELECT q.*
FROM `myproject.dataset.qualtrics` q
WHERE LastSubmission = (SELECT MAX(q2.LastSubmission) 
                        FROM `myproject.dataset.qualtrics` q2
                        WHERE q2.SubmissionID = q.SubmissionID
                       ) ;

A more "bigquery"ish way to write the query would use aggregation:

select array_agg(q order by q.LastSubmission desc limit 1)[ordinal(1)].*
from `myproject.dataset.qualtrics` q
group by q.SubmissionID;

BigQuery - Scalar subquery produced more than one element

Question

1 answers

solution1
2 ACCPTED 2020-12-22 17:47:11

BigQuery - Scalar subquery produced more than one element

Question

1 answers

solution1 2 ACCPTED 2020-12-22 17:47:11

solution1
2 ACCPTED 2020-12-22 17:47:11