简体   繁体   English

如果数据未正确加载到维度/事实表中,我需要执行哪些步骤来清理数据

[英]what are the steps I need to perform to clean the data if data into the dimension/fact table improperly loaded

假设有一个场景,有一个数据加载到事实表\\维表的过程,经过分析发现有1亿条记录被错误加载,我需要执行哪些步骤才能正确清理数据。

Here are two practices which help in that scenario:以下是有助于这种情况的两种做法:

  1. Take a backup or snapshot before each batch.在每批之前进行备份或快照。 In the case of a major error like this you can roll back to the snapshot, reload and process the correct data.如果出现此类重大错误,您可以回滚到快照,重新加载并处理正确的数据。

  2. Maintain an insert-only persistent staging area in the DW, such as a data vault, with each row stamped with a batch ID and timestamp.在 DW 中维护一个仅插入的持久暂存区,例如数据保险库,每行都标有批次 ID 和时间戳。 Remove the rows in error, and rebuild your facts and dimensions.删除错误的行,并重建您的事实和维度。

If this represents a real situation your only chance is #1.如果这代表真实情况,您唯一的机会是#1。

If you don't have a reliable backup, and you have updated and/or deleted rows during the ETL/ELT process, you don't have any record of the pre-fail state and it may be impossible to go back.如果您没有可靠的备份,并且您在 ETL/ELT 过程中更新和/或删除了行,则您没有任何失败前状态的记录,并且可能无法返回。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM