简体   繁体   中英

MYSQL optimise left join with sub-query in large dataset (query taking way too long)

I have a large dataset, and I have to do 3 joins, one of which is a sub-query. I chose to use a subquery instead of WHERE (IN or FIND_IN_SET) so that I don't lose values in my left, or base table. I need all of the data in the left column. Overall, I'm matching 11 million values with 900,000 values, so I expect this to take long, but it took ~20 seconds on a set of 200.

The engine is innoDB, each table has a primary key (IDvar).

I use the sub-query because I have to many values that I need to select from ( val1, val2,..., val100 ) and I want to avoid using the 'AND' command with a clause for every 'val'.

The query I am using is:

    SELECT *
    FROM table1
    LEFT JOIN (SELECT * FROM table2 WHERE table2.var IN(val1, val2,..., val100)) AS t
        USING (IDvar)
    LEFT JOIN table3 
        USING (IDvar);

The query looks fine to me. You'd want the following indexes:

create index idx_t1 on table1(idvar);
create index idx_t2 on table2(var, idvar);
create index idx_t3 on table3(idvar);

(Maybe it's just the second one that's missing.)

May I clarify why you do not use ON statement?

In general, when I do joins, I do the following

SELECT * 
    FROM table1 JOIN table2 ON table1.common_var = table2.common_var
                JOIN table3 ON table1.common_var2 = table3.common_var2
    WHERE ...;

so that there is no need to load the whole huge table.

If there is a need to get every possible combination of the two table, we can get the two table separately and programatically get the combinations.

SELECT * FROM table1;
SELECT * FROM table2;
... the rest in another program ...

Won't this do the same task? And possibly be more efficient?

SELECT t1.*, t2.*, t3.*
FROM      table1 AS t1
LEFT JOIN table2 AS t2  USING (IDvar)
LEFT JOIN table3 AS t3  USING (IDvar)
WHERE t2.var IN(val1, val2,..., val100);

Indexes needed:

t2:  (IDvar, var)  -- in this order
t3:  (IDvar)

No index on t1 will be used.

Having LEFT or not having it -- there is a big difference in this query.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM