简体   繁体   English

最佳连接两个MySQL表

[英]Optimal join in two MySQL tables

I have a table (T1) with ca. 我有一张桌子(T1)与ca. 500000 non duplicate records: 500000非重复记录:

ID1    Relation  ID2
4      Rel4      13
5      Rel5       4
13     Rel13     16
16     Rel16     5

I have the properties table T1_Prop: 我有属性表T1_Prop:

ID    Entity    
4     Ent4     
5     Ent5
13    Ent13   
16    Ent16  

I want to join these two tables (based on id : 4) in an efficient way as follows: 我想以一种有效的方式加入这两个表(基于id:4),如下所示:

 Entity   Relation   Entity
 Ent4      Rel4      Ent13  
 Ent5      Rel5      EntEnt4

I designed this select statement including JOIN which works fine. 我设计了这个select语句,包括JOIN,工作正常。 However, I am not sure if this the best way to do: 但是,我不确定这是否是最好的方法:

select 
  a.entity, 
  r.relation, 
  b.entity 
from T1 as r 
INNER JOIN T1_Prop as a ON a.ID=r.ID1 AND (r.ID1=4 OR r.ID2=4) 
INNER JOIN T1_Prop as b ON b.ID=r.ID2;

This is a fine use of SQL. 这是SQL的很好用。 It's built for this kind of query. 它是为这种查询而构建的。

You'll need two covering indexes to speed this up, on T1 . T1上,你需要两个覆盖索引才能加快速度。 They are: 他们是:

(ID1, ID2, relation)

and

(ID2, ID1, relation)

The two indexes are for handling the OR clause. 这两个索引用于处理OR子句。 It is the only potential performance issue I see, and that's just because OR operations sometimes trick the query planner into doing too much table scanning. 这是我看到的唯一潜在性能问题,这只是因为OR操作有时会欺骗查询规划器进行过多的表扫描。

Try refactoring your query to this to make your selection of ID values more visible. 尝试重构您的查询,以使您的ID值选择更加明显。

select   a.entity, r.relation, b.entity 
  from T1 as r 
 INNER JOIN T1_Prop as a ON a.ID=r.ID1  
 INNER JOIN T1_Prop as b ON b.ID=r.ID2
 WHERE (r.ID1=4 OR r.ID2=4) 

Then, if you have trouble with performance, after you create the covering indexes, refactor it again to 然后,如果您遇到性能问题,在创建覆盖索引之后,再次将其重构为

select   a.entity, r.relation, b.entity 
  from T1 as r 
 INNER JOIN T1_Prop as a ON a.ID=r.ID1  
 INNER JOIN T1_Prop as b ON b.ID=r.ID2
 WHERE r.ID1=4 
UNION
select   a.entity, r.relation, b.entity 
  from T1 as r 
 INNER JOIN T1_Prop as a ON a.ID=r.ID1  
 INNER JOIN T1_Prop as b ON b.ID=r.ID2
 WHERE r.ID2=4 

Your query looks fine except for the first ON clause. 除第一个ON子句外,您的查询看起来很好。 The condition (r.ID1=4 OR r.ID2=4) is not a rule for which record from T1_Prop to join to the T1 record. 条件(r.ID1=4 OR r.ID2=4)不是T1_Prop哪条记录加入T1记录的规则。 It is rather a condition, which T1 records to consider and belongs hence in the WHERE clause. 这是一个条件, T1记录要考虑并因此属于WHERE子句。

select 
  a.entity AS entity1, 
  r.relation, 
  b.entity AS entity2
FROM t1 AS r 
INNER JOIN t1_prop AS a ON a.id = r.id1
INNER JOIN t1_prop AS b ON b.id = r.id2
WHERE r.id1 = 4 OR r.id2 = 4;

This won't change the execution plan; 这不会改变执行计划; the DBMS will execute this just the same. DBMS将执行此操作。 But it's more readable as it shows the actual intention: get relations where one of the IDs is 4 and join the entities to those relations. 但它更具可读性,因为它显示了实际意图:获得其中一个ID为4的关系,并将实体连接到这些关系。

Another option to show this intention is: 显示此意图的另一个选择是:

select 
  a.entity AS entity1, 
  r.relation, 
  b.entity AS entity2
FROM (SELECT * FROM t1 WHERE r.id1 = 4 OR r.id2 = 4) AS r 
INNER JOIN t1_prop AS a ON a.id = r.id1
INNER JOIN t1_prop AS b ON b.id = r.id2;

Some consider subqueries in FROM less readable, but, well, others don't. 有些人认为FROM中的子查询不太可读,但是,其他人则不这么认为。 And when queries get more complex and say you even deal with aggregates from different tables, this is often the way to go and build a clean query. 当查询变得更复杂并且说您甚至处理来自不同表的聚合时,这通常是构建干净查询的方法。

Neither of above queries is actually better or worse than the other. 上述任何一种查询实际上都不比另一种更好或更差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM