[英]MYSQL optimise left join with sub-query in large dataset (query taking way too long)
I have a large dataset, and I have to do 3 joins, one of which is a sub-query. 我有一个很大的数据集,我必须做3个联接,其中之一是子查询。 I chose to use a subquery instead of WHERE (IN or FIND_IN_SET) so that I don't lose values in my left, or base table. 我选择使用子查询而不是WHERE(IN或FIND_IN_SET),以便不会丢失左侧表或基表中的值。 I need all of the data in the left column. 我需要左栏中的所有数据。 Overall, I'm matching 11 million values with 900,000 values, so I expect this to take long, but it took ~20 seconds on a set of 200. 总体而言,我将1100万个值与900,000个值进行匹配,因此我希望这会花费很长时间,但是在200个值上花了大约20秒。
The engine is innoDB, each table has a primary key (IDvar). 引擎是innoDB,每个表都有一个主键(IDvar)。
I use the sub-query because I have to many values that I need to select from ( val1, val2,..., val100
) and I want to avoid using the 'AND' command with a clause for every 'val'. 我使用子查询是因为我有很多需要从中选择的值( val1, val2,..., val100
),而且我想避免对每个“ val”使用带有子句的“ AND”命令。
The query I am using is: 我正在使用的查询是:
SELECT *
FROM table1
LEFT JOIN (SELECT * FROM table2 WHERE table2.var IN(val1, val2,..., val100)) AS t
USING (IDvar)
LEFT JOIN table3
USING (IDvar);
The query looks fine to me. 该查询对我来说很好。 You'd want the following indexes: 您需要以下索引:
create index idx_t1 on table1(idvar);
create index idx_t2 on table2(var, idvar);
create index idx_t3 on table3(idvar);
(Maybe it's just the second one that's missing.) (也许只是缺少的第二个。)
May I clarify why you do not use ON
statement? 我可以澄清一下为什么不使用ON
语句吗?
In general, when I do joins, I do the following 通常,当我加入时,我将执行以下操作
SELECT *
FROM table1 JOIN table2 ON table1.common_var = table2.common_var
JOIN table3 ON table1.common_var2 = table3.common_var2
WHERE ...;
so that there is no need to load the whole huge table. 这样就无需加载整个巨大的表。
If there is a need to get every possible combination of the two table, we can get the two table separately and programatically get the combinations. 如果需要获取两个表的所有可能组合,我们可以分别获取两个表并以编程方式获取组合。
SELECT * FROM table1;
SELECT * FROM table2;
... the rest in another program ...
Won't this do the same task? 这不会执行相同的任务吗? And possibly be more efficient? 可能更有效?
SELECT t1.*, t2.*, t3.*
FROM table1 AS t1
LEFT JOIN table2 AS t2 USING (IDvar)
LEFT JOIN table3 AS t3 USING (IDvar)
WHERE t2.var IN(val1, val2,..., val100);
Indexes needed: 所需索引:
t2: (IDvar, var) -- in this order
t3: (IDvar)
No index on t1
will be used. t1
不会使用任何索引。
Having LEFT
or not having it -- there is a big difference in this query. 是否拥有LEFT
此查询有很大的不同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.