简体   繁体   中英

Is it possible to make this recommender SQL query faster?

I have 2 tables, one of them stores items and the other one stores likes.

The table that stores likes is called video_liked, has 2 columns, video_id and user_id, with 2 indexes - video_id-user_id (UNIQUE) and user_id-video_id (PRIMARY).

The other table is called video, has a primary index and auto increment column id.

I am trying to get a list of items that were liked by the same users that liked the one the viewer is watching, ordered by the amount of people that liked them, with a minimum of 2 likes.

The query I am using is

SELECT vid . * , count( video_liked1.user_id ) AS PersonCount
FROM video AS vid, video_liked, video_liked AS video_liked1
WHERE video_liked.user_id = video_liked1.user_id
AND video_liked.video_id <> video_liked1.video_id
AND video_liked1.video_id = 'ITEM_ID'
AND vid.id = video_liked.video_id
GROUP BY video_liked.video_id
HAVING count( video_liked1.user_id ) >2
ORDER BY PersonCount DESC
LIMIT 12

The query is slow when there are lots of likes, so I reduced it to its most basic structure

SELECT vid. *
FROM video AS vid, video_liked, video_liked AS video_liked1
WHERE video_liked.user_id = video_liked1.user_id
AND video_liked.video_id <> video_liked1.video_id
AND video_liked1.video_id = 'ITEM_ID'
AND vid.id = video_liked.video_id
GROUP BY video_liked.video_id
LIMIT 12

Its a little bit faster but still takes 0.05 seconds to execute on a likes table with 28k rows

EXPLAIN gives me output that is too wide to fit here without wordwrapping, so here is a link to pastebin instead

http://pastebin.com/raw.php?i=6edwdniQ

Here are my tables also in pastebin

http://pastebin.com/raw.php?i=jwK1QucA

EDIT:

Changed the query as suggested

SELECT vid . *, count( v1.user_id ) AS PersonCount
FROM video AS vid
JOIN video_liked AS v1 ON vid.id = v1.video_id
JOIN video_liked AS v2 ON v2.video_id = 'ITEM_ID'
AND v1.user_id = v2.user_id
AND v1.video_id <> v2.video_id
GROUP BY v1.video_id
ORDER BY PersonCount DESC
LIMIT 12 

The culprit of the slowness seems to be using GROUP BY, which creates temporary tables.

Remove the CROSS JOIN s from your query. Those bloat your data set.

SELECT vid. *
FROM video AS vid
JOIN video_liked  AS v1 ON vid.video_id = v1.video_id
JOIN video_liked AS v2 ON v2.video_id = 'ITEM_ID' AND v1.user_id=v2.user_id AND v1.video_id <> v2.video_id
GROUP BY video_liked.video_id
LIMIT 12

In addition to removing the cross join, I'd explicitly define the columns that you need in the SELECT clause, even if you need all columns.

What platform is this DB on? What other indexes do you have on the video table?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM