[英]Merge and output selected data in SAS
I am trying to merge and output some dataset in SAS. 我正在尝试合并并在SAS中输出一些数据集。 The idea is very simple, 这个想法很简单,
My data looks like: 我的数据如下:
Data1 (Target Data) Data1(目标数据)
RIC date
VOD 03/02/2014
BATS 03/02/2014
... ...
Data2(Sample Data) 数据2(样本数据)
RIC date price
VOD 01/02/2014 50
VOD 03/02/2014 57
VOD 05/02/2014 64
VOD 06/02/2014 58
VOD 08/02/2014 64
VOD 10/02/2014 57
... ... ...
BATS 01/02/2014 70
BATS 03/02/2014 58
BATS 05/02/2014 67
BATS 06/02/2014 55
... ... ...
Now I need to merge Data1 with Data2 and only keep the Target data with a (-1, +1) trading day window . 现在,我需要将Data1与Data2合并,并且仅将目标数据保留在(-1,+1)交易日窗口中 。 The final output will look like this: 最终输出将如下所示:
RIC Trading_day_window date price
VOD -1 01/02/2014 50
VOD 0 03/02/2014 57
VOD +1 05/02/2014 64
BATS -1 01/02/2014 70
BATS 0 03/02/2014 58
BATS +1 05/02/2014 67
I know I have to use merge
here first. 我知道我必须先在这里使用merge
。 But how to only keep the Target data with a (-1, +1) trading day window? 但是,如何仅使用(-1,+1)交易日窗口保留目标数据?
I think I might use subquery
here. 我想我可能在这里使用subquery
。 Can anyone help me out ? 谁能帮我吗 ? Thanks ! 谢谢 !
Use a double DOW loop. 使用双DOW循环。 In the first one find the record where the dates match. 在第一个中找到日期匹配的记录。 In the second one output the records you want. 在第二个输出您想要的记录。
Here is your sample data, properly sorted. 这是您的示例数据,已正确排序。
data data1 ;
input RIC $ date ;
informat date ddmmyy10.;
format date yymmdd10.;
cards;
BATS 03/02/2014
VOD 03/02/2014
;;;;
data data2;
input RIC $ date price ;
informat date ddmmyy10.;
format date yymmdd10.;
cards;
BATS 01/02/2014 70
BATS 03/02/2014 58
BATS 05/02/2014 67
BATS 06/02/2014 55
VOD 01/02/2014 50
VOD 03/02/2014 57
VOD 05/02/2014 64
VOD 06/02/2014 58
VOD 08/02/2014 64
VOD 10/02/2014 57
;;;;
Now just merge by RIC and DATE and find the matching records. 现在,按RIC和DATE合并,找到匹配的记录。
data want ;
do trading_day=1 by 1 until (last.ric);
merge data1 (in=in1) data2;
by ric date;
if in1 then baseday = trading_day;
end;
do trading_day=1 by 1 until (last.ric);
merge data1 (in=in1) data2;
by ric date;
if baseday -1 <= trading_day <= baseday+1 then do;
trading_day_window = trading_day-baseday;
output;
end;
end;
run;
proc print; run;
您可以在数据步骤中使用保留语句。
A simple proc sql
statement can do this, using the between
statement in the join. 一个简单的proc sql
语句可以使用proc sql
的between
语句来做到这一点。 I've coded +/- 2 days as that seems to be the case in your example data, you can obviously make adjustments to this to comply with whatever rule you use to calculate trading window. 我已经对+/- 2天进行了编码,因为您的示例数据似乎是这种情况,您显然可以对此进行调整,以符合您用于计算交易时段的任何规则。
data data1;
input RIC $ date :ddmmyy10.;
format date date9.;
datalines;
VOD 03/02/2014
BATS 03/02/2014
;
run;
data data2;
input RIC $ date :ddmmyy10. price;
format date date9.;
datalines;
VOD 01/02/2014 50
VOD 03/02/2014 57
VOD 05/02/2014 64
VOD 06/02/2014 58
VOD 08/02/2014 64
VOD 10/02/2014 57
BATS 01/02/2014 70
BATS 03/02/2014 58
BATS 05/02/2014 67
BATS 06/02/2014 55
;
run;
proc sql;
create table want
as select
b.ric,
b.date-a.date as trading_day_window,
b.date,
b.price
from data1 as a
inner join
data2 as b
on a.ric=b.ric
and b.date between a.date-2 and a.date+2;
quit;
You've got some good answers here, so you'll have to play with it to pick the most efficient. 您在这里有一些不错的答案,因此您必须使用它来选择最有效的。
I'm suspecting that your data1 is pretty small. 我怀疑您的data1很小。 If it is, then I think this will be pretty efficient code as it avoids the sorting and potential sql optimizer malarchy. 如果是这样,那么我认为这将是非常有效的代码,因为它避免了排序和潜在的sql优化器错误。 Otherwise, the SQL solution seems the most practical to me. 否则,SQL解决方案对我来说似乎是最实用的。
proc sql noprint;
select count(*)
into :OBSCOUNT
from data1;
quit;
data want(drop=date_ref ric_ref);
set data2;
do i = 1 to &obscount.;
set data1 (rename=(date=date_ref ric=ric_ref)) point=i;
trading_day_window = (abs(date-date_ref)-1)*sign(date-date_ref);
if ric=ric_ref
and -1 <= trading_day_window <= 1
then output;
end;
run;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.