简体   繁体   中英

Optimising many-to-many query in MySQL

I have a table called ' items ' that looks something like this...

id | name
––––––––––––
1  | APPLES 
2  | BANANAS
3  | ORANGES
4  | PEARS

... and a junction table called ' pairs ', creating many-to-many relationships between the items...

id | item1_id | item2_id
––––––––––––––––––––––––
1  | 1        |  2 
2  | 1        |  4
3  | 2        |  3
4  | 2        |  4
5  | 4        |  3

I have the following query to find items which are paired with a given item...

SELECT * FROM items i
WHERE
  i.id IN (SELECT item1_id FROM pairs WHERE item2_id = 4)
OR
  i.id IN (SELECT item2_id FROM pairs WHERE item1_id = 4)

Returning something like...

id | name
––––––––––––
1  | APPLES 
3  | ORANGES

...which does the job, however, it runs pretty slowly (with a small test dataset of approximately 100 items, 1000 pairings it's already taking about 75ms).

My question is – can this be optimised further to speed it up (eg using joins rather than nested queries)?

Thanks for any help.

I think it will be sufficient to have indexes on pairs(item2_id, item1_id) and pairs(item1_id, item2_id) -- two separate indexes.

However, MySQL is sometimes funky about optimizing IN with subqueries. I would write this using exists :

SELECT i.*
FROM items i
WHERE EXISTS (SELECT 1
              FROM pairs p
              WHERE p.item2_id = 4 AND p.item1_id = i.id
             ) OR
      EXISTS (SELECT 1
              FROM pairs p
              WHERE p.item1_id = 4 AND p.item2_id = i.id
             );

These are guaranteed to use the indexes.

The internal query optimizer does a great job at creating an execution plan, although you can look at the plan and identify bottlenecks. Things like expressing the same query in a different way generally don't make a huge difference at the end of the day. Even queries that are really crazy looking, you'd be surprised at how well the optimizer handles them and how two seemingly different expressions of the same query ultimately lead to the same thing. Changing that to use joins instead will probably lead to the same or similar execution plan.

So what I would do first is to create an index on your item1_id column, and a separate index on your item2_id column. This will help improve performance of those where clauses. Then, if that still doesn't meet your requirements, have a look at the Optimization chapter in the MySQL docs (for whichever version of MySQL you are using) for a full run-down of possible strategies. Note that it will benefit you to avoid heavy optimizations prematurely, especially if your application is complex. Once your application is in a mostly working state, you'll be in a better position to identify and address bottlenecks. But indices are always an easy and worthwhile first step at any development stage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM