简体   繁体   中英

Need help modeling a fact table

I'm moving my first steps into datawarehousing.

I've bought the excellent book "The Data Warehouse Toolkit - Third Edition" by Kimball & Ross, which explained me how to grasp the fundamental concepts.
Today I've started designing my second data mart, but I'm already stuck with a (maybe stupid) problem. Suppose I'm modeling a simple sales event: a trivial fact table would be:

DATE_ID | CUSTOMER_ID | PRODUCT_ID | QUANTITY

every dimension has a many-to-many relationship with the others, as explained in the book and on the web.
Next, I'd like to add some more dimensions, like the carrier:

DATE_ID | CUSTOMER_ID | PRODUCT_ID | CARRIER_ID | QUANTITY

the dimensions are still in a many-to-many relation.
Now, I'm asked to add a lot (maybe a dozen or more) of details about the delivery, like a bunch of dates, routing, number of boxes and pallets, various flags etc, so I was thinking about a DELIVERY dimension table. My first attempt was:

DATE_ID | CUSTOMER_ID | PRODUCT_ID | CARRIER_ID | DELIVERY_ID | QUANTITY

but... surprise, the fact table now it's not in a many-to-many relationship anymore. So I thought: "well, I could refactor it, because now the other dimensions are in fact attributes of the delivery", but it would become

DELIVERY_ID | PRODUCT_ID | QUANTITY

and my fact table would just have 2 dimensions.
Now, in other situations I would treat the delivery as a degenerate dimension, but since I have to associate to it a lot of attibutes, I don't which route to follow:

  • create a DELIVERY dimension and refactor the fact table?
  • throw them in the fact table?
  • create a DELIVERY dimension and put the DELIVERY_ID in the fact table pretending that it's only a degenerate dimension?

Maybe it's not so simple to choose between dimensions and facts

As you describe it, delivery is a separate event in respect to the sale . So delivery should be a separate fact table.

Of course you could always "project" (so to speak) a fact in a dimension, if you do not need the added complexity. For example, say that you only need to know some simple fact about delivery: for example the carrier and the date of delivery. Then you could use a DELIVERY_ID in SALES and register these information in the DELIVERY dimension.

But if you have to register the full complexity of a delivery (there could be two or more deliveries relative to one sale, and two or more sales relative to one delivery) you need two fact tables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM