简体   繁体   English

更新事实表

[英]Updating of Fact tables

I have a flatfile resources that were extracted into facts and dimensions.我有一个平面文件资源,这些资源被提取到事实和维度中。 Some dimensions also comes from db resources.一些维度也来自数据库资源。 The transformation process is set on as needed basis (if there are new/updated from flatfiles).转换过程是根据需要设置的(如果有新的/更新的平面文件)。 The problem is this, some data reference doesn't exist or match on the dimension based on db resources so the foreign key id value on the fact is set to default (zero if no matching data).问题是,某些数据引用不存在或基于数据库资源的维度不匹配,因此事实的外键 id 值设置为默认值(如果没有匹配数据,则为零)。

How can i perform an update on the facts if the said dimension (db resource) has been updated?如果所述维度(数据库资源)已更新,我如何对事实执行更新? What was the best practice/routine for this kind of scenario?这种场景的最佳实践/例程是什么?

This is the sample illustration这是示例插图

Flatfile source                           product list (db source)
--------------------------------          ------------------------------
| product name | year | volume |          | prodcode |  name           |
--------------------------------          ------------------------------
| apple        | 2020 |  1000  |          | 001      | apple           |
| watermelon   | 2020 |  2000  |          | 002      | mango           |
--------------------------------          ------------------------------

Fact/Dimension事实/维度

production_fact                           dim_product
-------------------------------          ---------------------------
| fk_product| fk_date| volume |          | id | prodcode |  name   |
-------------------------------          --------------------------|
| 2         |  d001  |  1000  |          |  1 |  n/a      | n/a    |
| 1         |  d001  |  2000  |          |  2 |  001      | apple  |
-------------------------------          |  3 |  002      | mango  |
                                         ---------------------------

If the product list will be updated (003 watermelon), should i replace the dim_product row#1 with the new value?如果要更新产品列表(003 西瓜),我应该用新值替换 dim_product row#1 吗?

Based on your example, this is the way it should work:根据您的示例,这是它应该工作的方式:

Note: I would expect prodcode to be be in flatfile, not product name.注意:我希望 prodcode 位于平面文件中,而不是产品名称中。 Is this really how your data looks?这真的是您的数据的样子吗? Anyway I will proceed.无论如何,我会继续。

First set of data arrives.第一组数据到达。 Watermelon is in fact but not dimension.西瓜实际上是但不是维度。

Flatfile source                           product list (db source)
--------------------------------          ------------------------------
| product name | year | volume |          | prodcode |  name           |
--------------------------------          ------------------------------
| apple        | 2020 |  1000  |          | 001      | apple           |
| watermelon   | 2020 |  2000  |          | 002      | mango           |
--------------------------------          ------------------------------

We load a dimension record but it won't have any attribute values.我们加载了一个维度记录,但它没有任何属性值。 (As I said I would normally expect the code to be in the fact input data but that's fine we'll go with description). (正如我所说,我通常希望代码在实际输入数据中,但这很好,我们将 go 与描述)。 This will of course require some logic to find dimensions that are in fact but not in dimensions.这当然需要一些逻辑来找到实际上但不是维度的维度。

production_fact                           dim_product
-------------------------------      ------------------------------------------------
| fk_product| fk_date| volume |      | id | prodcode |  name       | weight |colour |
-------------------------------      ------------------------------------------------
| 2         |  d001  |  1000  |      |  1 |  n/a      | n/a        | n/a    | n/a   |
| 4         |  d001  |  2000  |      |  2 |  001      | apple      | 200mg  | red   |
-------------------------------      |  3 |  002      | mango      | 400mg  | yellow|
                                     |  4 |  ?        | watermelon | ?      |   ?   |
                                     ------------------------------------------------

So we have dimension SK 4 which is a legitimate dimension record except it's missing a load of attributes.所以我们有维度 SK 4,这是一个合法的维度记录,只是它缺少大量属性。

Later, the dimension arrives.后来,次元来了。 We know what it's meant to match on so we update the existing dimension which was missing data.我们知道匹配的含义,因此我们更新了缺少数据的现有维度。

 product list (db source)
-----------------------------------------------
| prodcode |  name           | weight |colour |
--------------------------------------|-------|
| 003      | watermelon      | 1kg    | green |
-----------------------------------------------


------------------------------------------------
| id | prodcode |  name       | weight |colour |
------------------------------------------------
|  1 |  n/a      | n/a        | n/a    | n/a   |
|  2 |  001      | apple      | 200mg  | red   |
|  3 |  002      | mango      | 400mg  | yellow|
|  4 |  003      | watermelon | 1kg    | green |
------------------------------------------------

You want to avoid ever updating large facts.您希望避免更新大型事实。 Updating smaller dimensions is a much better idea更新较小的尺寸是一个更好的主意

BTW this is a type 1 dimension.顺便说一句,这是一个类型 1 维度。 You can take the same appriach with a SCD except that you wouldn't count the first version of the dimension, you'd just overwrite it.您可以对 SCD 采用相同的方法,只是您不会计算维度的第一个版本,而只是覆盖它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM