简体   繁体   English

优化 MySQL 中的多对多查询

[英]Optimising many-to-many query in MySQL

I have a table called ' items ' that looks something like this...我有一个名为“ items ”的表,看起来像这样......

id | name
––––––––––––
1  | APPLES 
2  | BANANAS
3  | ORANGES
4  | PEARS

... and a junction table called ' pairs ', creating many-to-many relationships between the items... ...和一个名为“ pairs ”的连接表,在项目之间创建多对多关系...

id | item1_id | item2_id
––––––––––––––––––––––––
1  | 1        |  2 
2  | 1        |  4
3  | 2        |  3
4  | 2        |  4
5  | 4        |  3

I have the following query to find items which are paired with a given item...我有以下查询来查找与给定项目配对的项目...

SELECT * FROM items i
WHERE
  i.id IN (SELECT item1_id FROM pairs WHERE item2_id = 4)
OR
  i.id IN (SELECT item2_id FROM pairs WHERE item1_id = 4)

Returning something like...返回类似...

id | name
––––––––––––
1  | APPLES 
3  | ORANGES

...which does the job, however, it runs pretty slowly (with a small test dataset of approximately 100 items, 1000 pairings it's already taking about 75ms). ...它完成了这项工作,但是,它运行得很慢(使用大约 100 个项目的小型测试数据集,1000 个配对已经花费了大约 75 毫秒)。

My question is – can this be optimised further to speed it up (eg using joins rather than nested queries)?我的问题是——这是否可以进一步优化以加快速度(例如使用连接而不是嵌套查询)?

Thanks for any help.谢谢你的帮助。

I think it will be sufficient to have indexes on pairs(item2_id, item1_id) and pairs(item1_id, item2_id) -- two separate indexes.我认为在pairs(item2_id, item1_id)pairs(item1_id, item2_id)上有索引就足够了——两个单独的索引。

However, MySQL is sometimes funky about optimizing IN with subqueries.然而,MySQL 有时IN用子查询优化IN很时髦。 I would write this using exists :我会用exists写这个:

SELECT i.*
FROM items i
WHERE EXISTS (SELECT 1
              FROM pairs p
              WHERE p.item2_id = 4 AND p.item1_id = i.id
             ) OR
      EXISTS (SELECT 1
              FROM pairs p
              WHERE p.item1_id = 4 AND p.item2_id = i.id
             );

These are guaranteed to use the indexes.这些保证使用索引。

The internal query optimizer does a great job at creating an execution plan, although you can look at the plan and identify bottlenecks.内部查询优化器在创建执行计划方面做得很好,尽管您可以查看计划并确定瓶颈。 Things like expressing the same query in a different way generally don't make a huge difference at the end of the day.像以不同的方式表达相同的查询之类的事情在一天结束时通常不会产生巨大的差异。 Even queries that are really crazy looking, you'd be surprised at how well the optimizer handles them and how two seemingly different expressions of the same query ultimately lead to the same thing.即使是看起来非常疯狂的查询,您也会惊讶于优化器处理它们的能力如何,以及同一查询的两个看似不同的表达式如何最终导致相同的结果。 Changing that to use joins instead will probably lead to the same or similar execution plan.改为使用连接可能会导致相同或相似的执行计划。

So what I would do first is to create an index on your item1_id column, and a separate index on your item2_id column.所以我首先要做的是在您的 item1_id 列上创建一个索引,并在您的 item2_id 列上创建一个单独的索引。 This will help improve performance of those where clauses.这将有助于提高那些 where 子句的性能。 Then, if that still doesn't meet your requirements, have a look at the Optimization chapter in the MySQL docs (for whichever version of MySQL you are using) for a full run-down of possible strategies.然后,如果这仍然不能满足您的要求,请查看MySQL 文档的优化一章(无论您使用的是哪个版本的 MySQL),以完整了解可能的策略。 Note that it will benefit you to avoid heavy optimizations prematurely, especially if your application is complex.请注意,过早避免大量优化将使您受益,尤其是在您的应用程序很复杂的情况下。 Once your application is in a mostly working state, you'll be in a better position to identify and address bottlenecks.一旦您的应用程序处于大部分工作状态,您就可以更好地识别和解决瓶颈问题。 But indices are always an easy and worthwhile first step at any development stage.但在任何开发阶段,指数始终是轻松且值得的第一步。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM