I'm trying to query
The tricky part here is 2). I constructed a sub-query by mapping the course table onto itself but restricting matches only to datepassed=min(datepassed). The query appears to work for a very sample, but when I try to apply it to my full data set (which would return ~1 million records) the query takes impossibly long to execute (left it for >2 hours and still wouldn't complete).
Is there a more efficient way to do this? Appreciate all your help!
Query:
SELECT
S.id,
COUNT(C.course) as course_count,
C2.course as first_course,
DATEDIFF(MIN(C.datepassed),S.dateenrolled) as days_to_first
FROM student S
LEFT JOIN course C
ON C.studentid = S.id
LEFT JOIN (SELECT * FROM course GROUP BY studentid HAVING datepassed IN (MIN(datepassed))) C2
ON C2.studentid = C.studentid
WHERE YEAR(S.dateenrolled)=2013
AND U.id NOT IN (SELECT id FROM expelled)
GROUP BY S.id
ORDER BY S.id
Student table
id status dateenrolled
1 graduated 1/1/2013
3 graduated 1/1/2013
Expelled table
id dateexpelled
2 5/1/2013
Course table
studentid course datepassed
1 courseA 5/1/2014
1 courseB 1/1/2014
1 courseC 2/1/2014
1 courseD 3/1/2014
3 courseA 1/1/2014
3 couseB 2/1/2014
3 courseC 3/1/2014
3 courseD 4/1/2014
3 courseE 5/1/2014
SELECT id, course_count, days_to_first, C2.course first_course
FROM (
SELECT S.id, COUNT(C.course) course_count,
DATEDIFF(MIN(datepassed),S.dateenrolled) as days_to_first,
MIN(datepassed) min_datepassed
FROM student S
LEFT JOIN course C ON C.studentid = S.id
WHERE S.dateenrolled BETWEEN '2013-01-01' AND '2013-12-31'
AND S.id NOT IN (SELECT id FROM expelled)
GROUP BY S.id
) t1 LEFT JOIN course C2
ON C2.studentid = t1.id
AND C2.datepassed = t1.min_datepassed
ORDER BY id
I would try something like:
SELECT s.id, f.course,
COALESCE( DATEDIFF( c.first_pass,s.dateenrolled), 0 ) AS days_to_pass,
COALESCE( c.num_courses, 0 ) AS courses
FROM student s
LEFT JOIN
( SELECT studentid, MIN(datepassed) AS first_pass, COUNT(*) AS num_courses
FROM course
GROUP BY studentid ) c
ON s.id = c.studentid
JOIN course f
ON c.studentid = f.studentid AND c.first_pass = f.datepassed
LEFT JOIN expelled e
ON s.id = e.id
WHERE s.dateenrolled BETWEEN '2013-01-01' AND '2013-12-31'
AND e.id IS NULL
This query assumes a student can pass only one course on a given day, otherwise you can get more than one row for a student as its possible to have many first courses.
For performance it would help to have an index on dateenrolled in student table and a composite index on (studentid,datepassed) in courses table.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.