简体   繁体   English

预订系统使用什么类型的事实表/加载解决方案?

[英]What type of fact table / loading solution for a reservation system?

Background背景

I am designing a Data Warehouse with SQL Server 2012 and SSIS.我正在设计一个带有 SQL Server 2012 和 SSIS 的数据仓库。 The source system handles hotel reservations.源系统处理酒店预订。 The reservations are split between two tables, header and header line item.保留分为两个表,标题和标题行项目。 The Fact table would be at the line item level with some data from the header.事实表将位于行项目级别,其中包含来自标题的一些数据。

The issue问题

The challenge I have is that the reservation (and its line items) can change over time.我面临的挑战是预订(及其订单项)会随着时间而改变。

An example would be:一个例子是:

  • The booking is created.预订已创建。
  • A room is added to the booking (as a header line item).一个房间被添加到预订中(作为标题行项目)。
  • The customer arrives and adds food/drinks to their reservation (more line items).客户到达并将食物/饮料添加到他们的预订中(更多订单项)。
  • A payment is added to the reservation (as a line item).将付款添加到预订中(作为行项目)。
  • A room could be subsequently cancelled and removed from the booking (a line item is deleted).随后可以取消房间并将其从预订中删除(删除行项目)。
  • The number of people in a room can change, affecting that line item.房间中的人数可能会发生变化,从而影响该订单项。
  • The booking status changes from "Provisional" to "Confirmed" at a point in its life cycle.预订状态在其生命周期的某个时间点从“临时”更改为“已确认”。

Those last three points are key, I'm not sure how I would keep that line updated without looking up the record and updating it.最后三点是关键,我不确定如何在不查找记录并更新它的情况下保持该行的更新。 The business would like to keep track of the updates and deletions.企业希望跟踪更新和删除。

I'm resisting updating because:我拒绝更新,因为:

  1. From what I've read about Fact tables, its not good practice to revisit rows once they've been written into the table.从我读到的关于事实表的内容来看,一旦将行写入表中,重新访问行并不是一个好习惯。
  2. I could do this with a look-up component but with upward of 45 million rows, is that the best approach?我可以用一个查找组件来做到这一点,但有超过 4500 万行,这是最好的方法吗?

The questions问题

  1. What type of Fact table or loading solution should I go for?我应该选择哪种类型的事实表或加载解决方案?
  2. Should I be updating the records, if so how can I best do that?我是否应该更新记录,如果是这样,我怎样才能最好地做到这一点?

I'm open to any suggestions!我愿意接受任何建议!

Additional Questions (following answer from ElectricLlama):其他问题(以下来自 ElectricLlama 的回答):

  1. The fact does have a 1:1 relationship with the source.事实确实与来源有 1:1 的关系。 You talk about possible constraints on future development.您谈到了对未来发展的可能限制。 Would you be able to elaborate on the type of constraints I would face?您能否详细说明我将面临的限制类型?
  2. Each line item will have a modified (and created date).每个订单项都有一个修改(和创建)日期。 Are you saying that I should delete all records from the fact table which have been modified since the last import and add them again (sounds logical)?您是说我应该从事实表中删除自上次导入以来已修改的所有记录并再次添加它们(听起来很合乎逻辑)?
  3. If the answer to 2 is "yes" then for auditing purposes would I write the current fact records to a separate table before deleting them?如果 2 的答案是“是”,那么出于审计目的,我会在删除当前事实记录之前将它们写入单独的表中吗?
  4. In point one, you mention deleting/inserting the last x days bookings based on reservation date.在第一点中,您提到根据预订日期删除/插入最近 x 天的预订。 I can understand inserting new bookings.我可以理解插入新预订。 I'm just trying to understand why I would delete?我只是想了解为什么我会删除?

If you effectively have a 1:1 between the source line and the fact, and you store a source system booking code in the fact (no dimensional modelling rules against that) then I suggest you have a two step load process.如果您在源行和事实之间有效地有 1:1 的关系,并且您在事实中存储了源系统预订代码(没有针对此的维度建模规则),那么我建议您有一个两步加载过程。

  1. delete/insert the last x days bookings based on reservation date (or whatever you consider to be the primary fact date),根据预订日期(或您认为是主要事实日期的任何日期)删除/插入最后 x 天的预订,

  2. delete/insert based on all source booking codes that have changed (you will of course have to know this beforehand)根据已更改的所有源预订代码删除/插入(您当然必须事先知道这一点)

You just need to consider what constraints this puts on future development, ie when you get additional source systems to add, you'll need to maintain the 1:1 fact to source line relationship to keep your load process consistent.您只需要考虑这会对未来的开发产生哪些限制,即,当您要添加额外的源系统时,您需要保持 1:1 的事实与源线关系,以保持加载过程的一致性。

I've never updated a fact record in a dataload process, but always delete/insert a certain data domain (ie that domain might be trailing 20 days or source system booking code).我从来没有在数据加载过程中更新过事实记录,但总是删除/插入某个数据域(即该域可能会拖尾 20 天或源系统预订代码)。 This is effectively the same as an update but also takes cares of deletes.这实际上与更新相同,但也负责删除。

With regards to auditing changes in the source, I suggest you write that to a different table altogether, not the main fact, as it's purpose will be audit, not analysis.关于审计源中的更改,我建议您将其完全写到不同的表中,而不是主要事实,因为它的目的是审计,而不是分析。

The requirement to identify changed records in the source (for data loads and auditing) implies you will need to create triggers and log tables in the source or enable native SQL Server CDC if possible.识别源中已更改记录(用于数据加载和审计)的要求意味着您需要在源中创建触发器和日志表,或者尽可能启用本机 SQL Server CDC。

At all costs avoid using the SSIS lookup component as it is totally ineffective and would certainly be unable to operate on 45 million rows.不惜一切代价避免使用 SSIS 查找组件,因为它完全无效并且肯定无法对 4500 万行进行操作。

Stick with the 'insert/delete a data portion' approach as it lends itself to SSIS ability to insert/delete (and its inability to efficiently update or lookup)坚持使用“插入/删除数据部分”方法,因为它有助于 SSIS 插入/删除(以及它无法有效更新或查找)

In answer to the follow up questions:回答后续问题:

  1. 1:1 relationship in fact What I'm getting at is you have no visibility on any future systems that need to be integrated, or any visibility on what future upgrades to your existing source system might do.事实上,1:1 的关系我的意思是,您不了解任何需要集成的未来系统,也不了解您现有源系统的未来升级可能会做什么。 This 1:1 mapping introduces a design constraint (its not really a constraint, more a framework).这种 1:1 映射引入了设计约束(它不是真正的约束,更像是一个框架)。 Thinking about it, any new system does not need to follow this particular load design, as long as it's data arrive in the fact consistently.想想看,任何系统都不需要遵循这种特定的负载设计,只要它的数据一致地到达事实即可。 I think implementing this 1:1 design is a good idea, just trying to consider any downside.我认为实现这种 1:1 设计是一个好主意,只是试图考虑任何缺点。

  2. If your source has a reliable modified date then you're in luck as you can do a differential load - only load changed records.如果您的源具有可靠的修改日期,那么您很幸运,因为您可以进行差异加载 - 仅加载更改的记录。 I suggest you:我建议你:

    1. Load all recently modified records (last 5 days?) into a staging table将所有最近修改的记录(最近 5 天?)加载到临时表中
    2. Do a DELETE/INSERT based on the record key.根据记录键执行 DELETE/INSERT。 Do the delete inside SSIS in an execute SQL task, don't mess about with feeding data flows into row-by-row delete statements.在执行 SQL 任务中执行 SSIS 内部的删除操作,不要将数据流输入到逐行删除语句中。
  3. Audit table:审核表:

The simplest and most accurate way to do this is simply implement triggers and logs in the source system and keep it totally separate to your star schema.最简单、最准确的方法是在源系统中实现触发器和日志,并将其与您的星型模式完全分开。

If you do want this captured as part of your load process, I suggest you do a comparison between your staging table and the existing audit table and only write new audit rows, ie reservation X last modified date in the audit table is 2 Apr, but reservation X last modified date in the staging table is 4 Apr, so write this change as a new record to the audit table.如果您确实希望将其作为加载过程的一部分进行捕获,我建议您在临时表和现有审计表之间进行比较,并且只写入新的审计行,即审计表中的保留 X 上次修改日期是 4 月 2 日,但是临时表中的保留 X 上次修改日期是 4 月 4 日,因此将此更改作为新记录写入审计表。 Note that if you do a daily load, any changes in between won't be recorded, that's why I suggest triggers and logs in the source.请注意,如果您执行每日加载,则不会记录两者之间的任何更改,这就是我建议在源中使用触发器和日志的原因。

  1. DELETE/INSERT records in Fact DELETE/INSERT 记录事实上

This is more about ensuring you have an overlapping window in your load process, so that if the process fails for a couple of days (as they always do), you have some contingency there, and it will seamlessly pick the process back up once it's working again.这更多地是为了确保您的加载过程中有一个重叠的窗口,以便如果该过程失败了几天(就像他们经常做的那样),您在那里有一些意外情况,并且一旦它就会无缝地选择该过程备份再次工作。 This is not so important in your case as you have a modified date to identify differential changes, but normally for example I would pick a transaction date and delete, say 7 trailing days.这在您的情况下并不重要,因为您有一个修改日期来识别差异变化,但通常例如我会选择一个交易日期并删除,比如 7 个尾随天数。 This means that my load process can be borken for 6 days, and if I fix it by the seventh day everything will reload properly without needing extra intervention to load the intermediate days.这意味着我的加载过程可能会中断 6 天,如果我在第七天修复它,一切都会正确重新加载,而无需额外干预来加载中间几天。

I would suggest having a deleted flag and update that instead of deleting.我建议有一个已删除的标志并更新它而不是删除。 Your performance will also be better.你的表现也会更好。

This will enable you to perform an analysis on how the reservations are changing over a period of time.这将使您能够对预订在一段时间内的变化情况进行分析。 You will need to ensure that this flag is used in all the analysis to ensure that there is no confusion.您需要确保在所有分析中都使用此标志以确保没有混淆。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM