简体   繁体   中英

Populate fact table with foreign keys

I'm working on a project where I need to analyze Apache logs using SSAS. I've already loaded data into temporary table. I created dimension tables (primary key and attibute_name), empty fact table (foreign keys for each dimension table and fact_attribute) and created relations between them. Then I split data from that table into dimension tables using

INSERT INTO DimIP (IP) SELECT DISTINCT RemoteHostName FROM tmp

...and so on.

Now I need to populate Fact table with foreign keys, but I don't have any idea how to do this with single query. I tried something like this:

INSERT INTO Facts (DimDateID, DimIPID, DimRefererID, DimRequestID, DimStatusCodeID, DimUserAgentID)
SELECT DimDate.ID WHERE (DimDate.Data = tmp.DateTime)
SELECT DimIP.ID WHERE (DimIP.IP = tmp.RemoteHostName)
SELECT DimReferer.ID WHERE (DimReferer.Referer = tmp.Referer)
SELECT DimRequest.ID WHERE (DimRequest.Request = tmp.Request)
SELECT DimStatusCode.ID WHERE (DimStatusCode.StatusCode = tmp.StatusCode)
SELECT DimUserAgent.ID WHERE (DimUserAgent.UserAgent = tmp.UserAgent)

But it doesn't work (it says insert list contains fewer items than select list), probably I can't use such syntax.

I tried doing it one by one, like this:

INSERT INTO Facts (DimDateID)
SELECT DimDate.ID WHERE (DimDate.Data = tmp.DateTime)

But sometimes it says that other column can't be NULL (ex. DimUserAgentID), so query fails, sometimes it executes query, says "806000 rows affected" but nothing is inserted.

I will appreciate your help, cause I already ripped half of my hair from my head and don't know how the way to populate fact table with foreign keys from dimension tables.

I believe what you need to do is reference those other tables in your query. Below I use the tmp as the main driver of the query and then attempted to look up the resulting ID based on the logic you provided. Those lookups are via LEFT OUTER JOIN s which implies the relationship may not be there in which case NULL will go into your fact table. If you'd rather have the row filtered out of hitting the fact table, substitute an INNER JOIN for all of the occurrences. I also assumed your tables were all in dbo schema.

INSERT INTO
    dbo.Facts 
(
    DimDateID
,   DimIPID
,   DimRefererID
,   DimRequestID
,   DimStatusCodeID
,   DimUserAgentID
)
SELECT
    DimDate.ID 
,   DimIP.ID 
,   DimReferer.ID
,   DimRequest.ID 
,   DimStatusCode.ID
,   DimUserAgent.ID 
FROM
    TMP T
    LEFT OUTER JOIN
        dbo.DimDate 
        ON DimDate.Data = T.DateTime
    LEFT OUTER JOIN
        dbo.DimIP
        ON DimIP.IP = T.RemoteHostName
    LEFT OUTER JOIN
        dbo.DimReferer
        ON DimReferer.Referer = T.Referer
    LEFT OUTER JOIN
        dbo.DimRequest
        ON DimRequest.Request = T.Request
    LEFT OUTER JOIN
        dbo.DimStatusCode
        ON DimStatusCode.StatusCode = T.StatusCode
    LEFT OUTER JOIN
        dbo.DimUserAgent
        ON DimUserAgent.UserAgent = T.UserAgent

Finally, it seems you're missing something measurable, unless you're just counting rows in the Facts table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM