SQL return results for Table A, based on criteria from Table B

Question

I have 2 tables which share a 1 to many relationship. Assume the following structure:

users             users_metadata
-------------     -------------
id | email        id | user_id | type | score

A user can have many metadata. The users table has 100k rows, the users_metadata table has 300k rows. It'll likely grow 10x so whatever I write needs to be optimal for large amounts of data.

I need to write a sql statement that returns only user emails that pass a couple of different score conditions found in the metadata table.

// if type = 1 and if score > 75 then <1 point> else <0 points>
// if type = 2 and if score > 100 then <1 point> else <0 points>
// if type = 3 and if score > 0 then [-10 points] else <0 points>

// there are other types that we want to ignore in the score calculations

If the user passes a threshold (eg >= 1 point) then I want that user to be in the resultset, otherwise I want the user to be ignored.

I have tried user a stored function/cursor that takes a user_id and loops over the metadata to figure out the points, but the resulting execution was very slow (although it did work).

As it stands I have this, and it takes about 1 to 3 seconds to execute.

SELECT u.id, u.email,

    (
        SELECT 
            SUM(
                IF(k.type = 1, IF(k.score > 75, 1, 0), 0) + 
                IF(k.type = 2, IF(k.score > 100, 1, 0), 0) +
                IF(k.type = 3, IF(k.score > 0, 1, -10), 0)
            ) 
        FROM user_metadata k WHERE k.user_id = u.id
        
    ) AS total

FROM users u GROUP BY u.id HAVING total IS NOT NULL;

I feel like at 10x this is going to be even slower. a 1 to 3 second query execution time is too slow for what I need already.

What would a more optimal approach be?

If I use a language like PHP for this too, would running 2 queries, one to fetch user_ids from user_metadata of only passing users, and then a second to SELECT WHERE IN on that list of ids be better?

Answer 1

Try using a JOIN instead of correlated subquery.

SELECT u.id, u.email, t.total
FROM users AS u
JOIN (
    SELECT user_id, SUM(CASE type
        WHEN 1 THEN score > 75
        WHEN 2 THEN score > 100
        WHEN 3 THEN score > 0
        END) AS total
    FROM user_metadata
    GROUP BY user_id
) AS t ON u.id = t.user_id

There's also no need for you to use GROUP BY u.id in your query, since that's the primary key of the table you're querying; hopefully MySQL will optimize that out.

SQL return results for Table A, based on criteria from Table B

Question

1 answers

solution1
0 2022-01-10 23:20:59

SQL return results for Table A, based on criteria from Table B

Question

1 answers

solution1 0 2022-01-10 23:20:59

solution1
0 2022-01-10 23:20:59