简体   繁体   中英

How can I assign values to dataset based on time and overlapping numerical ranges? - SAS

I have a credit card transaction dataset (let's call it "Trans") with transaction amount, zip code , and date . I have another dataset (let's call it "Key") that lists sales tax rates based on date and geocode . The Key dataset also includes a range of zip codes associated with each geocode represented by 2 variables: Zip Start and Zip End.

Because Geocodes don't align with zip codes, some of the zip code ranges overlap. If this happens, I want to use the lowest sales tax rate associated with the zip code shown in Trans.

Trans dataset:

TransAmount TransDate TransZip
$200 01/07/1998 90010
$12 02/09/2002 90022

Key dataset:

Geocode Rate StartDate EndDate ZipStart ZipEnd
1001 .0825 199701 200012 90001 90084
1001 .085 200101 200812 90001 90084
1002 .0825 199701 200012 90022 90024
1002 .08 200101 200812 90022 90024

Desired output:

TransAmount TransDate TransZip Rate
$200 01/07/1998 90010 .0825
$12 02/09/2002 90022 .08

I used this basic SQL code in SAS, but I run into the problem of overlapping zip codes.

 proc sql;
 create table output as
 select a.*, b.zipstart, b.zipend, b.startdate, b.enddate, b.rate
 from Trans.CA_Zip_Cd_Testing a left join Key.CA_rates b
  on a.TranZip ge b.zipstart
  and a.TranZip le b.zipend
  and a.TransDate ge b.StartDate
  and a.transDate le b.EndDate
;
quit;

Well the easiest way to do this as far as the query portion is to just add a subquery to get the min rate.

Select t.transamount, t.transdate,t.transzip
        ,(Select MIN(rate) from Key where t.transzip between ZipStart and ZipEnd and t.transdate between startdate and enddate) 'Rate'
from trans t

You could also do it as subquery and join on it.

The SAS SQL Optimizer can be good sometimes. Other times, it can be a challenge. This code is going to be a bit more complicated, but it will likely be faster, and subject to size constraints on your key table.

data key;
set key;
   dummy_key=1;
run;

data want(drop=dummy_key geocode rate startDate endDate zipStart zipEnd rc i);
if _n_ = 1 then do;
 if 0 then set key;
 declare hash k (dataset:'key',multidata:'y');
 k.defineKey('dummy_key');
 k.defineData('geocode','rate','startdate','enddate','zipstart','zipend');
 k.defineDone();
end;
call missing (of _all_);
set trans;
dummy_key=1;

rc = k.find();
do i=1 to 1000 while (rc=0);
transZipNum = input(transZip,8.); *converts character zip to number. if its    already a number then remove;
zipStartNum = input(zipStart,8.);
zipEndNum = input(zipEnd,8.);

if startDate <= transDate <= endDate then do;
     if zipStartNum <= transZipNum <= zipEndNum then do;
          rate_out = min(rate_out,rate);
     end;
end;
rc=k.find_next();
end;
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM