[英]Loading unfinished data to data warehouse
The title might be confusing so I'd like to present my current problem. 标题可能令人困惑,所以我想介绍一下我当前的问题。
Please image the following situation: System stores devices' issues, which should be fixed by qualified workers. 请说明以下情况:系统存储设备的问题,应由合格的工人解决。 I have table "issue" with:
我有表“问题”与:
and other columns. 和其他列。 I have also a data warehouse which will store the "issues" and describe performance of those "workers" (working time mostly).
我还有一个数据仓库,用于存储“问题”并描述那些“工人”的表现(主要是工作时间)。
During the ETL process the biggest problem comes with "unsolved issues". 在ETL过程中,最大的问题来自“未解决的问题”。 I might have two possibilities:
我可能有两种可能性:
a) process only solved "issues", leave unsolved until they are finished then wait until they are finished and process them. a)仅处理已解决的“问题”,待解决后再解决,然后等待直到完成并处理它们。 This task however will not include in my reports issues, that might take too long to finish, which might be crucial in business aspect.
但是,此任务不会在我的报告中包含可能需要很长时间才能完成的问题,这在业务方面可能至关重要。
b) process both solved and unsolved issues, the PK in Fact table could be issueId and status. b)处理已解决和未解决的问题,事实表中的PK可以是issueId和status。 But then i'll store almost identical issues which might be weird, and difficult to analize.
但是,然后我将存储几乎相同的问题,这些问题可能很奇怪并且难以分析。
Is this common situation? 这是常见情况吗? Which of these two possibilities seems more reasonable?
这两种可能性中的哪一种似乎更合理? Or probably there is other, better way to do this?
也许还有其他更好的方法可以做到这一点?
It seems like there should be an issues dimension, and that dimension would hold the status column. 似乎应该有一个问题维度,该维度将包含状态列。 There are a couple of issues with changing facts:
事实变化有两个问题:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.