简体   繁体   English

审核表与类型2缓慢变化的维度

[英]audit table vs. Type 2 Slowly Changing Dimension

In SQL Server 2008+, we'd like to enable tracking of historical changes to a "Customers" table in an operational database. 在SQL Server 2008+中,我们希望启用对操作数据库中“客户”表的历史更改的跟踪。

It's a new table and our app controls all writing to the database, so we don't need evil hacks like triggers. 这是一张新表,我们的应用程序控制着所有写入数据库的操作,因此我们不需要触发器之类的邪恶技巧。 Instead we will build the change tracking into our business object layer, but we need to figure out the right database schema to use. 相反,我们会将更改跟踪构建到业务对象层中,但是我们需要找出要使用的正确数据库模式。

The number of rows will be under 100,000 and number of changes per record will average 1.5 per year. 行数将少于100,000,每条记录的更改数平均每年为1.5。

There are at least two ways we've been looking at modelling this: 我们一直在寻找至少两种方式进行建模:

  1. As a Type 2 Slowly Changing Dimension table called CustomersHistory , with columns for EffectiveStartDate , EffectiveEndDate (set to NULL for the current version of the customer), and auditing columns like ChangeReason and ChangedByUsername . 作为名为CustomersHistory类型2缓慢变化的维度表,其中包含EffectiveStartDateEffectiveEndDate (对于当前客户版本设置为NULL )的列以及审核列,例如ChangeReasonChangedByUsername Then we'd build a Customers view over that table which is filtered to EffectiveEndDate=NULL . 然后,我们将在该表上构建一个Customers视图,该视图被过滤为EffectiveEndDate=NULL Most parts of our app would query using that view, and only parts that need to be history-aware would query the underlying table. 我们应用程序的大多数部分都将使用该视图进行查询,只有需要历史记录的部分才能查询基础表。 For performance, we could materialize the view and/or add a filtered index on EffectiveEndDate=NULL. 为了提高性能,我们可以实例化视图和/或在EffectiveEndDate = NULL上添加过滤索引。

  2. With a separate audit table. 带有单独的审核表。 Every change to a Customer record writes once to the Customer table and again to a CustomerHistory audit table. 每一个变化的Customer记录,一旦到写入Customer表,并再次到CustomerHistory审计表。

From a quick review of StackOverflow questions, #2 seems to be much more popular. 快速查看StackOverflow问题后,#2似乎更受欢迎。 But is this because most DB apps have to deal with legacy and rogue writers? 但这是因为大多数数据库应用程序都必须与传统和流氓作者打交道吗?

Given that we're starting from a blank slate, what are pros and cons of either approach? 鉴于我们是从一片空白开始,这两种方法的优缺点是什么? Which would you recommend? 您会推荐哪个?

In general, the issue with SCD Type- II is, if the average number of changes in the values of the attributes are very high, you end-up having a very fat dimension table. 通常,SCD Type-II的问题是,如果属性值的平均更改次数非常多,那么最终您将拥有非常庞大的尺寸表。 This growing dimension table joined with a huge fact table slows down the query performance gradually. 不断增长的维度表和庞大的事实表一起逐渐降低了查询性能。 It's like slow-poisoning.. Initially you don't see the impact. 这就像缓慢的中毒。最初,您看不到这种影响。 When you realize it, it's too late! 当您意识到时,为时已晚!

Now I understand that you will create a separate materialized view with EffectiveEndDate = NULL and that will be used in most of your joins. 现在,我了解到您将使用EffectiveEndDate = NULL创建一个单独的实例化视图,该视图将在大多数联接中使用。 Additionally for you, the data volume is comparatively low (100,000). 另外,对您来说,数据量相对较低(100,000)。 With average changes of only 1.5 per year, I don't think data volume / query performance etc. are going to be your problem in the near future. 以每年平均1.5次的速度变化,我认为在不久的将来数据量/查询性能等不会成为您的问题。

In other words, your table is truly a slowly changing dimension (as opposed to a rapidly changing dimension - where your option #2 is a better fit). 换句话说,您的表实际上是一个缓慢变化的维度(而不是快速变化的维度 -您选择的选项#2更适合)。 In your case, I will prefer option #1. 在您的情况下,我会选择选项#1。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM