简体   繁体   English

为什么 UNION 比 LEFT JOIN with OR 快得多?

[英]Why is UNION much faster than LEFT JOIN with OR?

I have a fairly complex query that I really want to structure using LEFT JOIN without any UNION statements, but it runs too slow.我有一个相当复杂的查询,我真的想在没有任何 UNION 语句的情况下使用 LEFT JOIN 来构建它,但它运行得太慢了。 Even when I simplify it to isolate the issue, I don't understand why one query should run so much faster.即使我简化它以隔离问题,我也不明白为什么一个查询应该运行得更快。

I'm using MySQL version: 5.6.36-82.1-log我使用的是 MySQL 版本:5.6.36-82.1-log

Is there any way I can optimize this query without using UNION?有什么方法可以在不使用 UNION 的情况下优化此查询?

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

Run time: 13.422 seconds运行时间:13.422 秒

When I split this and use a UNION, it's much faster:当我拆分它并使用 UNION 时,它会快得多:

(select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `cities`.`name` = 'New York')
union
(select distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `locations`.`description` like '%New York%')

Run time: 0.219 seconds运行时间:0.219 秒

If I change 'left join' to (inner) 'join', it's much faster (but omits locations with no address):如果我将“左连接”更改为(内部)“连接”,速度会快得多(但会忽略没有地址的位置):

select SQL_NO_CACHE distinct `locations`.* from `locations` 
join `location_address` on `location_address`.`location_id` = `locations`.`id` 
join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

Run time: 0.219 seconds运行时间:0.219 秒

Also, adding the cities .此外,添加cities name condition to the LEFT JOIN doesn't help: LEFT JOIN 的name条件没有帮助:

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` AND `cities`.`name` = 'New York'
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

Run time: 13.812 seconds运行时间:13.812 秒

The entries in each table are:每个表中的条目是:

  • locations: ~5000 rows位置:~5000 行
  • location_address: ~4900 rows (~100 locations have 2 entries, ~200 locations have 0) location_address:~4900 行(~100 个位置有 2 个条目,~200 个位置有 0 个)
  • addresses: ~5500 rows (~600 addresses are linked from other tables)地址:~5500 行(~600 个地址从其他表链接)
  • cites: ~30,000 rows (Using a full cities database of the US)引用:~30,000 行(使用美国完整的城市数据库)

The id field on each table is the primary index, and the cities .每个表上的id字段是主索引, cities name is also an index. name也是一个索引。 locations . locations index is a long Text field. index是一个长文本字段。

Here is some example structure and data:以下是一些示例结构和数据:

locations地点

+----+----------------------+
| id | description          |
+----+---------------------+
| 1  | Somewhere out there  |
+----+----------------------+
| 2  | In New York          |
+----+----------------------+
| 3  | Elsewhere            |
+----+----------------------+

location_address位置地址

+----+-------------+------------+
| id | location_id | address_id |
+----+-------------+------------+
| 1  | 1           | 1          |
+----+-------------+------------+
| 2  | 1           | 2          |
+----+-------------+------------+
| 3  | 3           | 3          |
+----+-------------+------------+

addresses地址

+----+---------+
| id | city_id |
+----+---------+
| 1  | 1       |
+----+---------+
| 2  | 2       |
+----+---------+
| 3  | 2       |
+----+---------+

cities城市

+----+-----------+
| id | name      |
+----+-----------+
| 1  | New York  |
+----+-----------+
| 2  | Chicago   |
+----+-----------+
| 3  | Houston   |
+----+-----------+

I really want to avoid using UNION as I have a lot of conditional filters and sometimes I have to omit part of the union as I want to only use locations with addresses.我真的想避免使用 UNION,因为我有很多条件过滤器,有时我不得不省略部分联合,因为我只想使用带有地址的位置。 Using UNION is significantly increased the complexity of my query building code as well.使用 UNION 也显着增加了我的查询构建代码的复杂性。 I'd also like to avoid sub queries.我也想避免子查询。

You could write the query like so:您可以像这样编写查询:

select *
from
(
    Select <sql statement a>
    UNION
    Select <sql statement a>
) x
where x. <extra where clauses here>

You'd probably put the least restrictive clauses in the two unioned inner selects, and then add extra restrictions on the result.您可能会在两个联合的内部选择中放置限制最少的子句,然后对结果添加额外的限制。 This would allow the most flexibility, I think.我认为这将提供最大的灵活性。

If you look at the execution plans, you'll see that they are different.如果您查看执行计划,您会发现它们是不同的。 The issue is probably that indexes can be used more optimally for both subqueries.问题可能是索引可以更优化地用于两个子查询。 However, database optimizers are notoriously poor at optimizing or s.然而,众所周知,数据库优化器在优化or s 方面很差。

By the way, how does this version perform?顺便问一下,这个版本的表现如何?

select SQL_NO_CACHE l.*
from locations l
where exists (select 1
              from location_address la join
                   addresses a
                   on la.address_id = a.id join
                   cities c
                   on a.city_id = c.id
              where la.location_id = l.id and c.name = 'New York'
             ) or
     l.description like '%New York%';

You should be able to optimize this subquery so it works fast.您应该能够优化此子查询以使其快速运行。 Plus, you won't be incurring overhead to remove duplicates.此外,您不会因删除重复项而产生开销。

For performance, this can use indexes on location_address(location_id) , addresses(id, city_id) , and city(id, name) .为了提高性能,这可以使用location_address(location_id)addresses(id, city_id)city(id, name)上的索引。

I managed to solve the problem by adding an index to the pivot table:我设法通过向数据透视表添加索引来解决这个问题:

ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);

Run time: 0.188 seconds运行时间:0.188 秒

It's slightly faster than using the UNION method.它比使用 UNION 方法稍快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM