简体   繁体   中英

SQL subquery to return MIN of a column and corresponding values from another column

I'm trying to query

  1. number of courses passed,
  2. the earliest course passed
  3. time taken to pass first course, for each student who is not currently expelled.

The tricky part here is 2). I constructed a sub-query by mapping the course table onto itself but restricting matches only to datepassed=min(datepassed). The query appears to work for a very sample, but when I try to apply it to my full data set (which would return ~1 million records) the query takes impossibly long to execute (left it for >2 hours and still wouldn't complete).

Is there a more efficient way to do this? Appreciate all your help!

Query:

SELECT 
  S.id,
  COUNT(C.course) as course_count,
  C2.course as first_course,
  DATEDIFF(MIN(C.datepassed),S.dateenrolled) as days_to_first
FROM student S
LEFT JOIN course C 
  ON C.studentid = S.id
LEFT JOIN (SELECT * FROM course GROUP BY studentid HAVING datepassed IN (MIN(datepassed))) C2
  ON C2.studentid = C.studentid
WHERE YEAR(S.dateenrolled)=2013 
  AND U.id NOT IN (SELECT id FROM expelled)
GROUP BY S.id
ORDER BY S.id

Student table

id  status  dateenrolled
1   graduated   1/1/2013
3   graduated   1/1/2013

Expelled table

id  dateexpelled
2   5/1/2013

Course table

studentid   course  datepassed
1   courseA 5/1/2014
1   courseB 1/1/2014
1   courseC 2/1/2014
1   courseD 3/1/2014
3   courseA 1/1/2014
3   couseB  2/1/2014
3   courseC 3/1/2014
3   courseD 4/1/2014
3   courseE 5/1/2014
SELECT id, course_count, days_to_first, C2.course first_course
FROM (
    SELECT S.id, COUNT(C.course) course_count, 
        DATEDIFF(MIN(datepassed),S.dateenrolled) as days_to_first,
        MIN(datepassed) min_datepassed
    FROM student S
        LEFT JOIN course C ON C.studentid = S.id 
    WHERE S.dateenrolled BETWEEN '2013-01-01' AND '2013-12-31'
        AND S.id NOT IN (SELECT id FROM expelled)
    GROUP BY S.id
) t1 LEFT JOIN course C2 
    ON C2.studentid = t1.id
    AND C2.datepassed = t1.min_datepassed
ORDER BY id

I would try something like:

SELECT s.id, f.course,
  COALESCE( DATEDIFF( c.first_pass,s.dateenrolled), 0 ) AS days_to_pass, 
  COALESCE( c.num_courses, 0 ) AS courses
FROM student s
LEFT JOIN
( SELECT studentid, MIN(datepassed) AS first_pass, COUNT(*) AS num_courses
  FROM course
  GROUP BY studentid ) c
ON s.id = c.studentid
JOIN course f
ON c.studentid = f.studentid AND c.first_pass = f.datepassed
LEFT JOIN expelled e
ON s.id = e.id
WHERE s.dateenrolled BETWEEN '2013-01-01' AND '2013-12-31'
AND e.id IS NULL

This query assumes a student can pass only one course on a given day, otherwise you can get more than one row for a student as its possible to have many first courses.

For performance it would help to have an index on dateenrolled in student table and a composite index on (studentid,datepassed) in courses table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM