I have a fairly complex query that I really want to structure using LEFT JOIN without any UNION statements, but it runs too slow. Even when I simplify it to isolate the issue, I don't understand why one query should run so much faster.
I'm using MySQL version: 5.6.36-82.1-log
Is there any way I can optimize this query without using UNION?
select SQL_NO_CACHE distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'
Run time: 13.422 seconds
When I split this and use a UNION, it's much faster:
(select SQL_NO_CACHE distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York')
union
(select distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id`
where `locations`.`description` like '%New York%')
Run time: 0.219 seconds
If I change 'left join' to (inner) 'join', it's much faster (but omits locations with no address):
select SQL_NO_CACHE distinct `locations`.* from `locations`
join `location_address` on `location_address`.`location_id` = `locations`.`id`
join `addresses` on `location_address`.`address_id` = `addresses`.`id`
join `cities` on `addresses`.`city_id` = `cities`.`id`
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'
Run time: 0.219 seconds
Also, adding the cities
. name
condition to the LEFT JOIN doesn't help:
select SQL_NO_CACHE distinct `locations`.* from `locations`
left join `location_address` on `location_address`.`location_id` = `locations`.`id`
left join `addresses` on `location_address`.`address_id` = `addresses`.`id`
left join `cities` on `addresses`.`city_id` = `cities`.`id` AND `cities`.`name` = 'New York'
where `cities`.`name` = 'New York'
or `locations`.`description` like '%New York%'
Run time: 13.812 seconds
The entries in each table are:
The id
field on each table is the primary index, and the cities
. name
is also an index. locations
. index
is a long Text field.
Here is some example structure and data:
locations
+----+----------------------+
| id | description |
+----+---------------------+
| 1 | Somewhere out there |
+----+----------------------+
| 2 | In New York |
+----+----------------------+
| 3 | Elsewhere |
+----+----------------------+
location_address
+----+-------------+------------+
| id | location_id | address_id |
+----+-------------+------------+
| 1 | 1 | 1 |
+----+-------------+------------+
| 2 | 1 | 2 |
+----+-------------+------------+
| 3 | 3 | 3 |
+----+-------------+------------+
addresses
+----+---------+
| id | city_id |
+----+---------+
| 1 | 1 |
+----+---------+
| 2 | 2 |
+----+---------+
| 3 | 2 |
+----+---------+
cities
+----+-----------+
| id | name |
+----+-----------+
| 1 | New York |
+----+-----------+
| 2 | Chicago |
+----+-----------+
| 3 | Houston |
+----+-----------+
I really want to avoid using UNION as I have a lot of conditional filters and sometimes I have to omit part of the union as I want to only use locations with addresses. Using UNION is significantly increased the complexity of my query building code as well. I'd also like to avoid sub queries.
You could write the query like so:
select *
from
(
Select <sql statement a>
UNION
Select <sql statement a>
) x
where x. <extra where clauses here>
You'd probably put the least restrictive clauses in the two unioned inner selects, and then add extra restrictions on the result. This would allow the most flexibility, I think.
If you look at the execution plans, you'll see that they are different. The issue is probably that indexes can be used more optimally for both subqueries. However, database optimizers are notoriously poor at optimizing or
s.
By the way, how does this version perform?
select SQL_NO_CACHE l.*
from locations l
where exists (select 1
from location_address la join
addresses a
on la.address_id = a.id join
cities c
on a.city_id = c.id
where la.location_id = l.id and c.name = 'New York'
) or
l.description like '%New York%';
You should be able to optimize this subquery so it works fast. Plus, you won't be incurring overhead to remove duplicates.
For performance, this can use indexes on location_address(location_id)
, addresses(id, city_id)
, and city(id, name)
.
I managed to solve the problem by adding an index to the pivot table:
ALTER TABLE `location_address` ADD INDEX `location_id_index` (`location_id` ASC);
Run time: 0.188 seconds
It's slightly faster than using the UNION method.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.