简体   繁体   中英

How to join a fact table to a (Kimball type 2?) slowly changing date in SAS

New to SQL - I'd like to join the fact table crselist to the crseinfo table to get the correct dimension info. I've been working on some correlated subqueries but none give the desired result (below). The crseinfo table says that beginning in 199610 Art 508 belongs to college 09 and should be called OkArt..which updates in 200220 and 200300. Crselist lists the courses actually taught.

data crseinfo ; 
input crsenme $ crsenum crsefx crsecollege $ crsedesc $9.;
cards;
ART 508 199610 09 OkArt
ART 508 200220 18 WowItsArt
ART 508 200300 18 SuperArt
;
run;

data crselist; 
input  crsenme $ crsenum term section $; 
cards;
ART 508 199610 01
ART 508 199610 02
ART 508 199610 03
ART 508 199710 01
ART 508 200220 01
ART 508 200220 02
ART 508 201020 01
ART 508 201120 01
;
run;

The desired result would then be:

data desired ; 
input  crsenme $ crsenum term section $ crsecollege $ crsedesc $9.;
cards;
ART 508 199610 01 09 OkArt
ART 508 199610 02 09 OkArt
ART 508 199610 03 09 OkArt
ART 508 199710 01 09 OkArt
ART 508 200220 01 18 WowItsArt
ART 508 200220 02 18 WowItsArt
ART 508 201020 01 18 SuperArt
ART 508 201120 01 18 SuperArt
;

Referring to the SAS help page ( http://web.utk.edu/sas/OnlineTutor/1.2/en/60477/m70/m70_52.htm ) it would seem like I could do something like:

proc sql ; 
select * 
from crseinfo a, crselist b
where a.crsenme eq b.crsenme and 
  a.crsenum eq b.crsenum and 
  b.term eq (select min(c.term) 
   from crselist c 
   where c.term ge a.crsefx )
   ;
quit;

But this does not work. I am interested in a SQL-based solution - Thank-you for your time.

You're nearly there. Rather than using a correlated subquery, I think it's simpler to do this using a combination of having and group by clauses:

proc sql noprint _method;
    create table desired2 as
        select a.*, b.crsecollege, b.crsedesc
            from  crselist a left join crseinfo b
                on a.crsenme = b.crsenme and a.crsenum = b.crsenum
                    where a.term ge b.crsefx
                        group by a.crsenme, a.crsenum, a.term
                            having b.crsefx = max(b.crsefx)
;
quit;

Slightly simpler version:

proc sql noprint _method;
    create table desired3 as
        select a.*, b.crsecollege, b.crsedesc
            from  crselist a, crseinfo b
                where       a.crsenme = b.crsenme 
                            and a.crsenum = b.crsenum 
                            and a.term ge b.crsefx
                    group by a.crsenme, a.crsenum, a.term, a.section
                        having b.crsefx = max(b.crsefx)
;
quit;

These produce the same result, but the latter one is more readily optimised into a hash join.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM