简体   繁体   中英

Why is UNION much faster than LEFT JOIN with OR?

I have a fairly complex query that I really want to structure using LEFT JOIN without any UNION statements, but it runs too slow. Even when I simplify it to isolate the issue, I don't understand why one query should run so much faster.

I'm using MySQL version: 5.6.36-82.1-log

Is there any way I can optimize this query without using UNION?

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

Run time: 13.422 seconds

When I split this and use a UNION, it's much faster:

(select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `cities`.`name` = 'New York')
union
(select distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` 
where `locations`.`description` like '%New York%')

Run time: 0.219 seconds

If I change 'left join' to (inner) 'join', it's much faster (but omits locations with no address):

select SQL_NO_CACHE distinct `locations`.* from `locations` 
join `location_address` on `location_address`.`location_id` = `locations`.`id` 
join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

Run time: 0.219 seconds

Also, adding the cities . name condition to the LEFT JOIN doesn't help:

select SQL_NO_CACHE distinct `locations`.* from `locations` 
left join `location_address` on `location_address`.`location_id` = `locations`.`id` 
left join `addresses` on `location_address`.`address_id` = `addresses`.`id` 
left join `cities` on `addresses`.`city_id` = `cities`.`id` AND `cities`.`name` = 'New York'
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'

Run time: 13.812 seconds

The entries in each table are:

  • locations: ~5000 rows
  • location_address: ~4900 rows (~100 locations have 2 entries, ~200 locations have 0)
  • addresses: ~5500 rows (~600 addresses are linked from other tables)
  • cites: ~30,000 rows (Using a full cities database of the US)

The id field on each table is the primary index, and the cities . name is also an index. locations . index is a long Text field.

Here is some example structure and data:

locations

+----+----------------------+
| id | description          |
+----+---------------------+
| 1  | Somewhere out there  |
+----+----------------------+
| 2  | In New York          |
+----+----------------------+
| 3  | Elsewhere            |
+----+----------------------+

location_address

+----+-------------+------------+
| id | location_id | address_id |
+----+-------------+------------+
| 1  | 1           | 1          |
+----+-------------+------------+
| 2  | 1           | 2          |
+----+-------------+------------+
| 3  | 3           | 3          |
+----+-------------+------------+

addresses

+----+---------+
| id | city_id |
+----+---------+
| 1  | 1       |
+----+---------+
| 2  | 2       |
+----+---------+
| 3  | 2       |
+----+---------+

cities

+----+-----------+
| id | name      |
+----+-----------+
| 1  | New York  |
+----+-----------+
| 2  | Chicago   |
+----+-----------+
| 3  | Houston   |
+----+-----------+

I really want to avoid using UNION as I have a lot of conditional filters and sometimes I have to omit part of the union as I want to only use locations with addresses. Using UNION is significantly increased the complexity of my query building code as well. I'd also like to avoid sub queries.

You could write the query like so:

select *
from
(
    Select <sql statement a>
    UNION
    Select <sql statement a>
) x
where x. <extra where clauses here>

You'd probably put the least restrictive clauses in the two unioned inner selects, and then add extra restrictions on the result. This would allow the most flexibility, I think.

If you look at the execution plans, you'll see that they are different. The issue is probably that indexes can be used more optimally for both subqueries. However, database optimizers are notoriously poor at optimizing or s.

By the way, how does this version perform?

select SQL_NO_CACHE l.*
from locations l
where exists (select 1
              from location_address la join
                   addresses a
                   on la.address_id = a.id join
                   cities c
                   on a.city_id = c.id
              where la.location_id = l.id and c.name = 'New York'
             ) or
     l.description like '%New York%';

You should be able to optimize this subquery so it works fast. Plus, you won't be incurring overhead to remove duplicates.

For performance, this can use indexes on location_address(location_id) , addresses(id, city_id) , and city(id, name) .

I managed to solve the problem by adding an index to the pivot table:

ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);

Run time: 0.188 seconds

It's slightly faster than using the UNION method.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM