简体   繁体   中英

Data cube design: hard-to-aggregate measure

I'm in the process of designing the fact table for a data cube, and I have a measure which I don't really know how to correctly aggregate. The following SQL code will create a small sample fact table and dimension table:

create table FactTable (
    ID      int,
    Color   int,
    Flag    int)

insert into FactTable (ID, Color, Flag) values (1, 'RED',   1)
insert into FactTable (ID, Color, Flag) values (1, 'WHITE', 0)
insert into FactTable (ID, Color, Flag) values (1, 'BLUE',  1)
insert into FactTable (ID, Color, Flag) values (2, 'RED',   0)
insert into FactTable (ID, Color, Flag) values (2, 'WHITE', 0)
insert into FactTable (ID, Color, Flag) values (2, 'BLUE',  1)
insert into FactTable (ID, Color, Flag) values (3, 'RED',   1)
insert into FactTable (ID, Color, Flag) values (3, 'WHITE', 1)
insert into FactTable (ID, Color, Flag) values (3, 'BLUE',  1)

create table ColorDim (
    CID     int, 
    Color  int)

insert into ColorDim (CID, Color) values (1, 'RED')
insert into ColorDim (CID, Color) values (2, 'WHITE')
insert into ColorDim (CID, Color) values (3, 'BLUE')

FactTable and ColorDim are joined on FactTable.Color = ColorDim.Color. In the cube, there should be a measure called 'Patriotic' which counts object IDs including the colors red, white, or blue (at least one of the colors). The desired output is as follows:

  • When browsing the cube, if the user pulls in the Patriotic measure (pulling no dimensions), the total shown should be 2, since there are 2 IDs (namely, 1 and 3) which include at least one of the three colors. Notice that ID 1 should contribute 1 to the total Patriotic value, even though it has two of the colors.
  • If the user browses the Patriotic measure by the Color dimension, they should see a table like the following. Note that the the ID 1 contributes 1 to the RED count and 1 to the BLUE count.

    +--------+-----------+
    | Color | Patriotic |
    +--------+-----------+
    | RED | 2 |
    | WHITE | 1 |
    | BLUE | 2 |
    +--------+-----------+

(I tried to create a table using this web app , but the spacing doesn't appear to be correct. Hopefully it's readable enough to understand.)

I'm sure this is a very basic cube design situation, but I haven't worked with cubes much before, and the measures I've used are usually simple SUMs of columns, or products of SUMs of columns, etc. Any help would be much appreciated.

(If it's relevant, I'm running the SQL queries which build the fact/dimension tables in MS SQL Server 2008, and I'll be designing the cube itself in MS Visual Studio 2008.)

I'll give it a try, although I'm not 100% sure I understand the questions. Also I don't want to post queries into comments to verify if they are valid. If I'm way off and this is not helpful, I'll delete the answer.

When browsing the cube, if the user pulls in the Patriotic measure (pulling no dimensions), the total shown should be 2, since there are 2 IDs (namely, 1 and 3) which include at least one of the three colors. Notice that ID 1 should contribute 1 to the total Patriotic value, even though it has two of the colors.

WITH MyCTE (id, Count)
AS
(
select id, count(flag) as count
from FactTable
where Flag=1
group by id
having COUNT(flag) >=2
)
select COUNT(*) from MyCTE

If the user browses the Patriotic measure by the Color dimension, they should see a table like the following. Note that the the ID 1 contributes 1 to the RED count and 1 to the BLUE count.

select a.Color, COUNT(*)
from FactTable a
    join ColorDim b
    on a.Color = b.Color
where Flag = 1
group by a.Color

Not entirely sure why you Fact table needs to be a cross join between "ID" and "Color". You can simply eliminiate all Flag=0 rows and use a simple count of the ID column as your Patriotic measure, a distinct count will give you the total of Patriotic rows.

You also do not need a Color dimension as there is no extra information being provided by the ColorDim table.

However, if more colours were added to the rows, you would be able to add the 'Patriotic' flag to the ColorDim table. Any queries would then be able to filter by the 'Patriotic' flag and still get accurate counts for Patriotic rows.

create table FactTable (
    ID      int,
    Color   int
    )

insert into FactTable (ID, Color) values (1, 'RED')
insert into FactTable (ID, Color) values (1, 'BLUE')
insert into FactTable (ID, Color) values (2, 'BLUE')
insert into FactTable (ID, Color) values (3, 'RED')
insert into FactTable (ID, Color) values (3, 'WHITE')
insert into FactTable (ID, Color) values (3, 'BLUE')

   create table ColorDim (
        CID     int, 
        Color  int,
        PatrioticFlag int
    )

insert into ColorDim (CID, Color) values (1, 'RED',1)
insert into ColorDim (CID, Color) values (2, 'WHITE',1)
insert into ColorDim (CID, Color) values (3, 'BLUE',1)
insert into ColorDim (CID, Color) values (4, 'BEIGE',0)

I finally figured it out. First, I added one row per ID to the fact table containing pre-aggregated data for that ID, so the fact table becomes:

create table FactTable (
    ID      int,
    Color   int,
    Flag    int)

insert into FactTable (ID, Color, Flag) values (1, 'RED',   1)
insert into FactTable (ID, Color, Flag) values (1, 'WHITE', 0)
insert into FactTable (ID, Color, Flag) values (1, 'BLUE',  1)
insert into FactTable (ID, Color, Flag) values (1, 'PATRIOTIC',  1)
insert into FactTable (ID, Color, Flag) values (2, 'RED',   0)
insert into FactTable (ID, Color, Flag) values (2, 'WHITE', 0)
insert into FactTable (ID, Color, Flag) values (2, 'BLUE',  1)
insert into FactTable (ID, Color, Flag) values (2, 'PATRIOTIC',  1)
insert into FactTable (ID, Color, Flag) values (3, 'RED',   1)
insert into FactTable (ID, Color, Flag) values (3, 'WHITE', 1)
insert into FactTable (ID, Color, Flag) values (3, 'BLUE',  1)
insert into FactTable (ID, Color, Flag) values (3, 'PATRIOTIC',  1)

Similarly, add a row to the color dimension table:

create table ColorDim (
    CID     int, 
    Color  int)

insert into ColorDim (CID, Color) values (1, 'RED')
insert into ColorDim (CID, Color) values (2, 'WHITE')
insert into ColorDim (CID, Color) values (3, 'BLUE')
insert into ColorDim (CID, Color) values (4, 'PATRIOTIC')

Then, in MS Visual Studio, edit the DefaultMember property of the Color attribute in the Color Dimension as:

[Color Dimension].[ColorDim].&[PATRIOTIC]

The DefaultMember property tells MS Visual Studio that rows of the fact table which have Color 'PATRIOTIC' are already aggregates of the other rows with the same ID and other Color values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM