简体   繁体   English

sql性能-从多个表中获取时的查询设计

[英]sql performance - query design when fetching from multiple tables

Ok, premise. 好,前提。 Three tables, simple enough for this exercise: 三个表,对于此练习足够简单:

table first:
id, name

table second:
id, firstId, secondName

table third
id, thirdName, secondId

I want to take all rows in third that have a foreignkey to a row in the second that have a relation to a certain "first" row id. 我想让第三行中所有具有外键的行都进入第二行中与某个“第一”行ID相关的行。

Typical sql: 典型的sql:

select t.id, s.id as secondId, t.thirdName, s.secondName from third t 
inner join second s on t.secondId=s.id where s.firstId = X

So here is my question: 所以这是我的问题:

Would it be faster performance wise, to have a column in third instead that is a foreign key directly to first? 将性能放在第三位而不是直接指向第一位的外键是否会更快,是明智的选择?

ie

table third:
id, secondId, firstId, name

So that i instead could make the query: 这样我可以进行查询:

select t.id, s.id as secondId, t.thirdName, s.secondName from third t 
inner join second s on t.secondId=s.id where t.firstId = X 

There are no less joins since i need the data from "second" too, but i'd make the lookup on "firstId" from third rather than second. 因为我也需要来自“第二个”的数据,所以连接数也不少,但是我会从第三个而不是第二个开始对“ firstId”进行查找。

Just curious if anybody has any input :) 只是好奇是否有人输入:)

Suppose the second way is faster, if you re-write your first query as: 假设第二个方法更快,如果你重新写你的第一个查询为:

select t.id, s.id as secondId, t.thirdName, s.secondName from second s
inner join third t on t.id=s.id where s.firstId = X

Note the swapped placements of second and third . 请注意secondthird的交换位置。 With this you will see the exact same performance as your second example, but the third table will be smaller because it doesn't have the extra redundant field. 这样,您将看到与第二个示例完全相同的性能,但是third表将较小,因为它没有多余的字段。

To point out the benefits of not having this field, it's easier to point out what adding an extra redundant field will do to performance: 为了指出没有该字段的好处,更容易指出添加额外的冗余字段将对性能产生什么影响:

  • consume more disk space 消耗更多的磁盘空间
  • slow down any table scans because rows will now be slightly longer 慢下来的任何表扫描,因为现在的行会稍长
  • update performance will also be slightly slower 更新性能也会稍微慢一点
  • among others... 其中...

While theoretical, it this overall sounds an awful lot like a premature optimization, you should only be doing this IF your existing query is slow (even when doing my above re-write of it), at which point you will get a much better bang for your buck by just improving your indexes. 从理论上讲,这听起来像是过早的优化,但如果您现有的查询很慢(即使在上面重新编写时),则只应这样做,这时您会获得更好的效果仅通过改善索引就能为您带来收益。

The surest way to find out is to try it and see. 找出答案的最可靠方法就是尝试看看。

However, given that you need to join to the second table anyway, I would actually expect it to be a bit slower, since you would have to fetch all records from the table third first, and then link each of them to the appropriate record on second , rather than fetching the second records first and then linking to the third records - so you would be retrieving 2*m*n records in the first scenario, and only (m+1)*n records in the second. 然而,考虑到你需要加入到第二个表,无论如何,我真的希望它是有点慢,因为你必须获取表中的所有记录的third首,然后链接他们每个人相应的记录上second ,而不是先获取second条记录然后再链接到third条记录-因此,您将在第一种情况下检索2 * m * n条记录,而在第二种情况下仅检索(m + 1)* n条记录。

Of course, if you didn't need to link to the second table, the query would run much faster if it only accessed the third table. 当然,如果您不需要链接到second表,则仅访问third表的查询将运行得更快。

Your proposed design would be incorrect. 您建议的设计不正确。 There is nothing to guarantee that third.firstId matches the second.firstId of the parent row. 没有什么可以保证third.firstId比赛second.firstId父行的。

Correctness is more important than performance! 正确比性能更重要!


That said, you might be able to use identifying relationships and natural keys (as opposed to non-identifying relationships and surrogate keys): 也就是说,您也许可以使用标识关系和自然键(与非标识关系和代理键相对):

在此处输入图片说明

This is appropriate if thirdName does not need to be unique on its own, but only in the context of the parent row from the second table, and secondName does not need to be unique on its own, but only in the context of the parent row from the first table. 如果这是合适的thirdName 并不需要是唯一对自己,但只有在父行从上下文中second表, secondName 并不需要是唯一对自己,但只有在父行的背景从first桌子开始。

In this scenario, you could avoid JOINs altogether and still get firstId , secondName and thirdName : 在这种情况下,您可以完全避免JOIN并仍然获取firstIdsecondNamethirdName

SELECT *
FROM third
WHERE firstId = X

Even if there are other fields, not shown above, that you need to read from second , the JOIN will still be faster because InnoDB clusters the data and you'd more naturally follow this clustering. 即使有其他领域,上面没有显示,你需要从读取second ,连接仍然会更快,因为InnoDB的聚类中的数据和你更自然地遵循这个集群。 And by avoiding surrogate keys, you'd avoid expensive secondary indexes (see "Disadvantages of clustering" in this article ). 并通过避免代理键,你会避免昂贵的二级指标(见“集群的缺点” 这篇文章 )。

The price you pay is in each successive child table growing progressively "fatter". 您所付出的代价是在每个连续的子表中逐渐“ fatter”。 Whether this is a price worth paying, only you can determine by performing measurements on representative amounts of data. 这是否值得付出代价,只有您才能通过对代表性数据量进行测量来确定。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM