I have a table cast with about 1.5 million rows, and a smaller table watched with about 1000-2000 rows. Both tables share a column named movieId. I am trying to run this query:
SELECT actorId, COUNT( actorId )
FROM cast t1
WHERE EXISTS (
SELECT userId
FROM watched t2
WHERE t1.movieId = t2.movieId
AND t2.userId =8
)
GROUP BY actorId
However, it is taking like 5 seconds to return the results. I a multi column index on actorId and movieId in the cast table and indices on userId and movieId in the watched table. The query returns around 20000 results. Is there any way I could optimize my query/tables, so that the query would run faster?
For this query:
SELECT c.actorId, COUNT(*)
FROM cast c
WHERE EXISTS (SELECT 1
FROM watched w
WHERE w.movieId = c.movieId AND w.userId = 8
)
GROUP BY c.actorId;
You want an index on watched(movieId, userId)
. An index on cast(movieId, actorId)
might also prove useful.
Notice that I changed the table aliases to be more meaningful than arbitrary letters.
EDIT:
Given the size of the tables, I think an explicit join
might be better:
SELECT c.actorId, COUNT(*)
FROM watched w JOIN
cast c
ON w.movieId = c.movieId
WHERE w.userId = 8
GROUP BY c.actorId;
For this query, you want indexes on watched(userId, movieId)
and cast(movieId, actorId)
. This version assumes you don't have duplicate rows in watched
.
perhaps using an inner join instead of an exists will give you better performance. Assuming movieId and userId are indexed, try inner joining to watched using the filters in your nested where clause:
Select .....
From
cast c inner join watched w
On w.movieid = c.movieid
And w.userid = 8
Group by ....
.
The above, in theory, should be a less expensive operation as each record isn't tested in an exists clause.
Please excuse the lack of styling, I'm posting from an iPad.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.