I have 2 tables which share a 1 to many relationship. Assume the following structure:
users users_metadata
------------- -------------
id | email id | user_id | type | score
A user can have many metadata. The users table has 100k rows, the users_metadata table has 300k rows. It'll likely grow 10x so whatever I write needs to be optimal for large amounts of data.
I need to write a sql statement that returns only user emails that pass a couple of different score conditions found in the metadata table.
// if type = 1 and if score > 75 then <1 point> else <0 points>
// if type = 2 and if score > 100 then <1 point> else <0 points>
// if type = 3 and if score > 0 then [-10 points] else <0 points>
// there are other types that we want to ignore in the score calculations
If the user passes a threshold (eg >= 1 point) then I want that user to be in the resultset, otherwise I want the user to be ignored.
I have tried user a stored function/cursor that takes a user_id and loops over the metadata to figure out the points, but the resulting execution was very slow (although it did work).
As it stands I have this, and it takes about 1 to 3 seconds to execute.
SELECT u.id, u.email,
(
SELECT
SUM(
IF(k.type = 1, IF(k.score > 75, 1, 0), 0) +
IF(k.type = 2, IF(k.score > 100, 1, 0), 0) +
IF(k.type = 3, IF(k.score > 0, 1, -10), 0)
)
FROM user_metadata k WHERE k.user_id = u.id
) AS total
FROM users u GROUP BY u.id HAVING total IS NOT NULL;
I feel like at 10x this is going to be even slower. a 1 to 3 second query execution time is too slow for what I need already.
What would a more optimal approach be?
If I use a language like PHP for this too, would running 2 queries, one to fetch user_ids
from user_metadata
of only passing
users, and then a second to SELECT WHERE IN on that list of ids be better?
Try using a JOIN instead of correlated subquery.
SELECT u.id, u.email, t.total
FROM users AS u
JOIN (
SELECT user_id, SUM(CASE type
WHEN 1 THEN score > 75
WHEN 2 THEN score > 100
WHEN 3 THEN score > 0
END) AS total
FROM user_metadata
GROUP BY user_id
) AS t ON u.id = t.user_id
There's also no need for you to use GROUP BY u.id
in your query, since that's the primary key of the table you're querying; hopefully MySQL will optimize that out.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.