简体   繁体   English

SQL Server:UNION ALL与将所有列聚合到一个表中

[英]SQL Server: UNION ALL vs Aggregate all columns into one table

I have a question on the performance of two designs. 我对两种设计的性能有疑问。 The goal is to store multiple types of entities where they share some attributes but also differ. 目标是存储多种类型的实体,它们在其中共享一些属性但也有所不同。

Approach 1: Multiple tables, each modelling one entity 方法1:多个表,每个表建模一个实体

Entity1 - C1, C2, C3
Entity2 - C1, C2, C4
Entity3 - C1, C2, C5    

To query, I need to perform a UNION ALL on all the tables. 要查询,我需要在所有表上执行UNION ALL

Approach 2: Single table with all the columns and a type column 方法2:具有所有列和一个类型列的单个表

All - Type, C1, C2, C3, C4, C5

Here, I can query directly on the columns. 在这里,我可以直接在列上查询。

The question is does the UNION ALL approach have any performance issues? 问题是UNION ALL方法是否存在性能问题? This question is similar to previously asked question on PostsgreSQL, which has not been answered. 这个问题类似于先前关于PostsgreSQL的问题 ,尚未得到回答。

EDIT: 编辑:

Thank you for all answers. 感谢您的所有答案。

The entity table are date indexed. 实体表已建立日期索引。 And the queries are most of time date filtered, or shared fields filtered. 并且查询大部分时间是日期过滤的,或共享字段过滤的。 Suppose C1 is a date, C2 is a string, 95% of the queries look like C1>=from and C1<=to, or C2='SomeId'. 假设C1是日期,C2是字符串,95%的查询看起来像C1> = from和C1 <= to或C2 ='SomeId'。

Number of records grows slowly, maybe a few hundred per entity per day. 记录数量增长缓慢,每个实体每天可能数百个。 Number of columns won't grow beyond 150. However, the number of shared columns is small. 列数不会超过150。但是,共享列数很小。 currently I have implemented Approach 1 because each entity may use fields other than the shared as primary key. 目前,我已经实现了方法1,因为每个实体都可以使用共享字段以外的其他字段作为主键。 This way the constraints are more natural. 这样,约束更加自然。

In making this choice it depends greatly on how wide the table would need to be, if there are any shared columns, how large the tables will be, what kind of queries you will be performing against the tables, etc. 在做出此选择时,它很大程度上取决于表的宽度,是否有共享列,表的大小,将对表执行哪种查询等。

As a rule of thumb, do not put into one table if the table width will be anywhere close to the maximum width the database supports for a record. 根据经验,如果表的宽度接近数据库支持记录的最大宽度,则不要放入一个表中。 Less wide tables tend to perform better. 较小的表往往表现更好。 If there are very few columns you are talking about, this is likely the best solution. 如果您正在谈论的专栏很少,那么这可能是最好的解决方案。

If the common columns will be the ones most commonly queried, then consider designing a parent table with the common columns and three child tables for the type specific ones. 如果公共列是最常查询的列,则考虑设计一个包含公共列和三个子表的父表以用于特定于类型的列。

If there are very few common columns and types will most likely usually be queried by themselves (Type a and Type B would not generally both be in the result set in the most frequently run types of queries), then separate tables with a view that does the UNION all for the few times you need to query all of them will work. 如果公用列很少,并且通常很可能会自行查询类型(在运行频率最高的查询类型中,类型a和类型B通常不会同时出现在结果集中),则使用视图只需查询几次,所有UNION都可以使用。

If you only need to query all types for reporting but not all of the ordinary day-to-day stuff, consider having separate tables and a data warehouse for reporting. 如果只需要查询所有类型的报表,而不需要查询所有常规的日常工作,请考虑使用单独的表和数据仓库进行报表。

How many rows are you planning to have roughly? 您计划大致拥有几行? I have experience of working with a large table like this where they went for the single table approach and it is very slow to get any data back unless you are hitting one of the indexes (table is approx 250 columns by almost 1 billion rows). 我有使用大型表的经验,他们使用单表方法,除非您找到其中一个索引(表大约250列乘以10亿行),否则获取任何数据的速度非常慢。

Because of the number of columns it is not practical to build an index for every common filtering criteria as this would slow down inserts considerably on a transactional system. 由于列数众多,因此无法为每个常见的过滤条件建立索引,因为这会大大降低事务系统中的插入速度。 This example would certainly be a lot easier if the tables were separate and we perhaps had a view to put them together for occasions when we had to query all of the data together. 如果表是分开的,那么这个示例肯定会容易得多,并且在某些情况下,当我们不得不将所有数据查询在一起时,我们可能会将它们放在一起。

However, I am concious that there are a lot of variables to consider. 但是,我知道有很多变量要考虑。 If you are working with a database that is primarily used for OLAP rather than OLTP then you may not have any concerns about adding a lot of indexes for example. 如果您使用的是主要用于OLAP而不是OLTP的数据库,那么您可能不必担心添加很多索引。

As an alternative, you may combine approaches 1 and 2, ie you may create "ancestor" table: 或者,您可以组合方法1和2,即可以创建“祖先”表:

All - ID, Type, C1, C2

And three "descendant" tables, where ID is PK and at the same time it is FK to ID of All table: 还有三个“后代”表,其中ID是PK,同时它是FK到All表的ID

Entity1 - ID, C3
Entity2 - ID, C4
Entity3 - ID, C5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM