简体   繁体   English

数据仓库-数据版本控制

[英]Data Warehouse - Versioning of data

I am currently designing a data warehouse for a financial company. 我目前正在为一家金融公司设计数据仓库。 While a large amount of the process is quite standard I have been presented with an issue (That I believe only exists in the finance sector) of data events that effect a number of rows and their history that can happen at any time. 尽管大量流程是非常标准的,但我还是遇到了一个数据事件(我相信这仅在金融部门存在)的问题,这些事件会影响随时可能发生的许多行及其历史记录。

To explain the issue better. 为了更好地解释这个问题。 Assume we have an Account A and other 2 months 4 transactions have occurred that effect its balance, changing it from 10000 to 20000. When I run a report for that month its fine it will show the activity that derives that value. 假设我们有一个帐户A,并且发生了另外2个月的4个交易,影响其余额,将其从10000更改为20000。当我运行该月的报告时,其罚款将显示产生该值的活动。 Now it gets difficult, a month after I backdate a transaction that effects that balance changing it from 20000 to 15000. 现在变得困难了,我回溯了一个交易的一个月,该交易影响了将余额从20000更改为15000的平衡。

Running a report Before that back dating should tell me the original 20000, but after back dated transaction should tell me the 15,000. 运行报告在此之前,回溯日期应该告诉我原始的20000,但是在回溯日期之后,交易应该告诉我15,000。

To illustrate better refer to data below. 为了更好地说明,请参考下面的数据。


Transactions for September and October 9月和10月的交易

with a back dated transaction on the 28th of October for the 13th of September of $500 10月28日至9月13日的回溯交易为500美元

and a back dated transaction on the 8th of November for the 17th of September to credit the $-50 以及在11月8日回溯到9月17日的交易,以记入$ -50

╔═════════════════╦═════════════════════════╦════════╦══════════════════╦═══════════════╦═════════════╦═════════╗
║ Key_Transaction ║ SK_TransactionEffective ║ Amount ║ PrincipleBalance ║ SK_ReportDate ║ SK_AsOfDate ║ Version ║
╠═════════════════╬═════════════════════════╬════════╬══════════════════╬═══════════════╬═════════════╬═════════╣
║               1 ║ 12/09/2018              ║  -1000 ║            20000 ║ 12/09/2018    ║ NULL        ║ 1       ║
║               6 ║ 13/09/2018              ║   -500 ║            19500 ║ 13/09/2018    ║ 28/10/2018  ║ 2       ║
║               2 ║ 16/09/2018              ║    -50 ║            19950 ║ 16/09/2018    ║ NULL        ║ 1       ║
║               7 ║ 16/09/2018              ║    -50 ║            19450 ║ 16/09/2018    ║ 28/10/2018  ║ 2       ║
║              12 ║ 16/09/2018              ║     50 ║            19950 ║ 16/09/2018    ║ 8/11/2018   ║ 3       ║
║               3 ║ 1/10/2018               ║    250 ║            20200 ║ 30/09/2018    ║ NULL        ║ 1       ║
║               8 ║ 1/10/2018               ║    250 ║            19700 ║ 30/09/2018    ║ 28/10/2018  ║ 2       ║
║              13 ║ 1/10/2018               ║    250 ║            20200 ║ 30/09/2018    ║ 8/11/2018   ║ 3       ║
║               4 ║ 6/10/2018               ║  -1200 ║            19000 ║ 6/10/2018     ║ NULL        ║ 1       ║
║               9 ║ 6/10/2018               ║  -1200 ║            17800 ║ 6/10/2018     ║ 28/10/2018  ║ 2       ║
║              14 ║ 6/10/2018               ║  -1200 ║            19000 ║ 6/10/2018     ║ 8/11/2018   ║ 3       ║
║               5 ║ 22/10/2018              ║    100 ║            19100 ║ 22/10/2018    ║ NULL        ║ 1       ║
║              10 ║ 22/10/2018              ║    100 ║            17900 ║ 22/10/2018    ║ 28/10/2018  ║ 2       ║
║              15 ║ 22/10/2018              ║    100 ║            19100 ║ 22/10/2018    ║ 8/11/2018   ║ 3       ║
║              11 ║ 29/10/2018              ║  -1000 ║            16900 ║ 29/10/2018    ║ NULL        ║ (New)1  ║
║              16 ║ 29/10/2018              ║  -1000 ║            18100 ║ 29/10/2018    ║ 8/11/2018   ║ (New)2  ║
╚═════════════════╩═════════════════════════╩════════╩══════════════════╩═══════════════╩═════════════╩═════════╝

Now when I run a report for September (2018-09-01 to 2018-09-30) I should be V1 or when SK_AsOfDate is NULL 现在,当我运行9月(2018-09-01至2018-09-30)的报告时,我应该是V1或SK_AsOfDate为NULL

If I run a report for October (2018-10-01 to 2018-10-31) my last record should be (11) with a principle balance of 16900 如果我运行10月(2018-10-01至2018-10-31)的报告,则我的上一个记录应为(11),本金余额为16900

And my current Principle balance (As of 2018-11-09) should be calculated as of the balance from (16) with PB of (18100) 我当前的本金余额(截至2018-11-09)应从(16)的余额计算,PB为(18100)

I have added the SK_AsOfDate to try and deal with versioning issue, but I am still struggling to see a simple and elegant way to achieve this "What was my balance as of 2018-09-30 that will ignore V2 and V3 alterations. 我已经添加了SK_AsOfDate来尝试解决版本控制问题,但我仍在努力寻找一种简单而优雅的方法来实现此目标“截至2018-09-30,我的平衡是什么,而将忽略V2和V3更改。

I want to get this right and luckily am not too far down any path, so suggestions are welcome! 我想做对了,幸运的是,我走的路还不算太远,因此欢迎提出建议! I am happy to add as many fields as makes this process simple for reporting out of the other end. 我很乐意添加尽可能多的字段,以使此过程更易于报告到另一端。

In the financial (and some other) transaction data you basically have two time dimensions . 在财务(和其他一些)交易数据中,您基本上有两个时间维度

Transaction date - representig the real time of the transaction happen, but due to technical reason you may recieve the transaction later. 交易日期 -代表交易实时发生,但是由于技术原因,您可能稍后会收到交易。

Booking date - this is the timestamp the transaction entered your booking system. 预订日期 -这是交易进入您的预订系统的时间戳。 Sometimes called entry date . 有时称为输入日期

Concerning the transaction date the transaction may occure as late arriving , contrary booking date is per definition always up to date. 关于交易日期 ,交易可能会迟到发生 ,相反的预订日期根据定义始终是最新的。

The two time dimension allow two different kinds of reports . 两个时间维度允许两种不同类型的报告

The booking date report is typically used for bookkeping purposes (as it never changes in history). 预订日期报告通常用于预订目的(因为它从未更改过历史记录)。 The transaction date report is more time realistic, but running it on two days for the last month can produce two different results (due to late transaction arrived on the second day). 交易日期报告更现实,但是在最后一个月的两天运行会产生两个不同的结果(由于第二天才延迟交易)。

It looks that is some problem with late arriving facts. 看来这是迟到的事实带来的问题。 The question is what you would like to report later. 问题是您以后要报告什么。 Would you like to report the new values somehow or just to ignore the new arriving facts. 您想以某种方式报告新值还是只是忽略新到的事实。

The first step is to determine businesskey that would enable to notify the difference. 第一步是确定可以通知差异的业务密钥。 Is the Amount or PrincipleBalancefor Key_Transaction changing in time, or just new records arriving? Key_Transaction的Amount或PrincipleBalance是否随时间变化,或者只是新记录到来? Try to create snapshots of table to find the diffenrence of values in time in order to create good busineskey. 尝试创建表快照以及时发现值的差异,以创建良好的商务键。

Some good ideas can be found here: http://www.disoln.org/2013/12/Design-Approach-to-Handle-Late-Arriving-Dimensions-and-Late-Arriving-Facts.html 一些好的想法可以在这里找到: http : //www.disoln.org/2013/12/Design-Approach-to-Handle-Late-Arriving-Dimensions-and-Late-Arriving-Facts.html

What is the source database ? 什么是源数据库? In Sql Server you can try to use Change Data Capture (it has to be enabled on server) or create mechanism mentioned above in your ETL. 在Sql Server中,您可以尝试使用Change Data Capture(必须在服务器上启用它)或在ETL中创建上述机制。

I guess that the table you mentioned is not a low level grain but some kind of already aggregation based on other tables. 我猜您提到的表不是低级表,而是某种已经基于其他表进行聚合的表。 Try to ask what stays technically behind it and dig deeper to find how it works. 尝试询问在技术上仍存在哪些问题,并深入研究以发现其工作原理。

I think your case can be solved with "snapshot" tables. 我认为您的情况可以通过“快照”表解决。 In financial world like you elaborated as of 2018-10-31 or as of 2018-11-09 is important and you need to keep a copy of your data for each "as of", it might be different for each org looks like yours is weekly. 在像您在2018年10月31日或2018年11月9日阐述的金融世界中,这一点很重要,您需要为每个“截止日期”保留一份数据副本,对于每个看起来像您的组织而言,它可能会有所不同是每周一次。 It is upto you to decide the frequency. 由您决定频率。 When you have this set of data regardless of the final state you can go back and get an accurate report. 当您拥有这组数据时,无论最终状态如何,都可以返回并获得准确的报告。

The way to create these "snapshot" tables is basically create a copy of your fact table on each "as of" date with a "snapshot date", this snapshot date aka "as of" could be used in your reports to see the version of data you need to see. 创建这些“快照”表的方法基本上是在每个“截至”日期创建一个事实表的副本,并带有一个“快照日期”,该快照日期又称为“截至”,可在您的报告中使用以查看版本您需要查看的数据。

Let me know if this solves your problem. 让我知道这是否可以解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM