如何优化“ IN（SELECT…）”查询

Question

I'm trying to make a select from two tables, table_a has 600 million of rows while table_b has only 20 of them. 我正在尝试从两个表中进行选择，table_a有6亿行，而table_b只有20行。

The code currently looks something like the one below. 该代码当前看起来类似于下面的代码。

        SELECT
            field_1,field_2
        FROM
            table_a
        WHERE
             table_a.field_3 IN (SELECT field_3 FROM table_b WHERE field_4 LIKE 'some_phrase%')

It works fine but is very slow. 它工作正常，但是非常慢。 I guess it's slow as it has to check each of the rows with the select in WHERE. 我猜这很慢，因为它必须使用WHERE中的select检查每一行。 I thought that I could somehow make a variable with values from the select and use variable instead of a nested select, but I cannot make it work. 我以为可以用select中的值创建一个变量，然后使用变量而不是嵌套的select，但是我无法使其正常工作。 I was thinking about something like this: 我在想这样的事情：

SELECT  @myVariable :=field_3 FROM table_b WHERE field_4 LIKE 'some_phrase%;

        SELECT
            field_1,field_2
        FROM
            table_a
        WHERE
             table_a.field_3 IN (@myVariable)

I learned that it won't work with IN() so I also tried FIND_IN_SET but I also couldn't make it work. 我了解到它不能与IN()因此我也尝试了FIND_IN_SET但也无法使其工作。 I would appreciate any help. 我将不胜感激任何帮助。

Answer 1

Instead of a IN clause you could use JOIN on the subquery 代替IN子句，您可以在子查询上使用JOIN

  SELECT field_1,field_2
  FROM  table_a
  INNER JOIN  (
    SELECT field_3 
    FROM table_b 
    WHERE field_4 LIKE 'some_phrase%'
 ) t on t.field_3 =   table_a.field_3

but be sure you a proper index on column field_3 of table_b and column field_3 of table_a 但请确保在field_3的table_b列和field_3的table_a列上有正确的索引

Answer 2

Actually, the assuming the subquery on table_b is not particularly large or non performant, you might want to focus on optimizing the outer query on table_a . 实际上，假设table_b上的子查询不是特别大或性能不佳，则您可能需要集中精力优化table_a上的外部查询。 Adding an appropriate index is one option, such as: 添加适当的索引是一种选择，例如：

CREATE INDEX idx ON table_a (field_3, field_1, field_2);

This index should completely cover the WHERE and SELECT clauses. 该索引应完全覆盖WHERE和SELECT子句。 Note that in the case of the subquery, MySQL would just evaluate it once and cache the result set somewhere. 注意，对于子查询，MySQL只会对其进行一次评估，并将结果集缓存在某个地方。 If the subquery be very large, then you might want to rewrite the query using a join: 如果子查询非常大，则您可能希望使用联接重写查询：

SELECT DISTINCT a.field_1, a.field_2
FROM table_a a
INNER JOIN table_b b
    ON a.field_3 = b.field_3
WHERE
    b.field_4 LIKE 'some_phrase%';

Here the following additional index might help: 以下附加索引可能会有所帮助：

CREATE INDED idx2 ON table_b (field_4, field_3);

如何优化“ IN（SELECT…）”查询

问题描述

2 个解决方案

解决方案1
1 2019-09-04 11:00:01

解决方案2
0 已采纳 2019-09-04 10:56:50

如何优化“ IN（SELECT…）”查询

问题描述

2 个解决方案

解决方案1 1 2019-09-04 11:00:01

解决方案2 0 已采纳 2019-09-04 10:56:50

解决方案1
1 2019-09-04 11:00:01

解决方案2
0 已采纳 2019-09-04 10:56:50