简体   繁体   English

跨流数据更改-EDW

[英]Cross Stream Data changes - EDW

I got a scenario where Data Stream B is dependent on Data Stream A. Whenever there is change in Data Stream A it is required re-process the Stream B. So a common process is required to identify the changes across datastreams and trigger the re-processing tasks. 我有一个场景,数据流B依赖于数据流A。只要数据流A中发生更改,就需要重新处理流B。因此,需要一个通用过程来识别数据流之间的更改并触发重新操作。处理任务。 Is there a good way to do this besides triggers. 除了触发器之外,还有什么好方法吗?

Your question is rather unclear and I think any answer depends very heavily on what your data looks like, how you load it, how you can identify changes, if you need to show multiple versions of one fact or dimension value to users etc. 您的问题尚不清楚,我想任何答案都将在很大程度上取决于您的数据看起来如何,如何加载数据,如何识别更改,是否需要向用户显示一个事实或维度值的多个版本等。

Here is a short description of how we handle it, it may or may not help you: 这是我们处理方式的简短说明,它可能会也可能无法帮助您:

  1. We load raw data incrementally daily, ie we load all data generated in the last 24 hours in the source system (I'm glossing over timing issues, but they aren't important here) 我们每天递增地加载原始数据,也就是说,我们将源系统中最近24小时内生成的所有数据都加载(我正在解决时序问题,但在这里并不重要)
  2. We insert the raw data into a loading table; 我们将原始数据插入到加载表中; that table already contains all data that we have previously loaded from the same source 该表已经包含了我们先前从同一来源加载的所有数据
  3. If rows are completely new (ie the PK value in the raw data is new) they are processed normally 如果行是全新的(即原始数据中的PK值是新的),则将正常处理它们
  4. If we find a row where we already have the PK in the table, we know it is an updated version of data that we've already processed 如果发现表中已经有PK的行,我们就知道它是我们已经处理过的数据的更新版本
  5. Where we find updated data, we flag it for special processing and re-generate any data depending on it (this is all done in stored procedures) 在找到更新数据的地方,我们将其标记为进行特殊处理并根据它重新生成任何数据(所有这些操作都在存储过程中完成)

I think you're asking how to do step 5, but it depends on the data that changes and what your users expect to happen. 我认为您在问如何执行步骤5,但这取决于更改的数据以及用户期望发生的情况。 For example, if one item in an order changes, we re-process the entire order to ensure that the order-level values are correct. 例如,如果订单中的一项发生更改,我们将重新处理整个订单以确保订单级别的值正确。 If a customer address changes, we have to re-assign him to a new sales region. 如果客户地址发生变化,我们必须将其重新分配到新的销售区域。

There is no generic way to identify data changes and process them, because everyone's data and requirements are different and everyone has a different toolset and different constraints and so on. 没有通用的方法来识别数据更改并进行处理,因为每个人的数据和需求都不同,每个人都有不同的工具集和不同的约束等等。

If you can make your question more specific then maybe you'll get a better answer, eg if you already have a working solution based on triggers then why do you want to change? 如果您可以使问题更具体,那么也许您会得到更好的答案,例如,如果您已经有了基于触发器的有效解决方案,那么为什么要更改? What problem are you having that is making you look for an alternative? 您有什么问题正在寻找替代方案?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM