简体   繁体   English

合并并输出SAS中的选定数据

[英]Merge and output selected data in SAS

I am trying to merge and output some dataset in SAS. 我正在尝试合并并在SAS中输出一些数据集。 The idea is very simple, 这个想法很简单,

My data looks like: 我的数据如下:

Data1 (Target Data) Data1(目标数据)

RIC       date           
VOD     03/02/2014         
BATS    03/02/2014         
...       ...             

Data2(Sample Data) 数据2(样本数据)

RIC       date           price
VOD     01/02/2014         50
VOD     03/02/2014         57
VOD     05/02/2014         64
VOD     06/02/2014         58
VOD     08/02/2014         64
VOD     10/02/2014         57
...       ...             ...
BATS    01/02/2014         70
BATS    03/02/2014         58
BATS    05/02/2014         67
BATS    06/02/2014         55
...       ...             ...

Now I need to merge Data1 with Data2 and only keep the Target data with a (-1, +1) trading day window . 现在,我需要将Data1与Data2合并,并且仅将目标数据保留在(-1,+1)交易日窗口中 The final output will look like this: 最终输出将如下所示:

RIC  Trading_day_window     date           price
VOD         -1            01/02/2014         50
VOD          0            03/02/2014         57
VOD         +1            05/02/2014         64
BATS        -1            01/02/2014         70
BATS         0            03/02/2014         58
BATS        +1            05/02/2014         67

I know I have to use merge here first. 我知道我必须先在这里使用merge But how to only keep the Target data with a (-1, +1) trading day window? 但是,如何仅使用(-1,+1)交易日窗口保留目标数据?

I think I might use subquery here. 我想我可能在这里使用subquery Can anyone help me out ? 谁能帮我吗 ? Thanks ! 谢谢 !

Use a double DOW loop. 使用双DOW循环。 In the first one find the record where the dates match. 在第一个中找到日期匹配的记录。 In the second one output the records you want. 在第二个输出您想要的记录。

Here is your sample data, properly sorted. 这是您的示例数据,已正确排序。

data data1 ;
  input RIC $ date ;
  informat date ddmmyy10.;
  format date yymmdd10.;
cards;
BATS 03/02/2014
VOD 03/02/2014
;;;;
data data2;
  input RIC $ date price ;
  informat date ddmmyy10.;
  format date yymmdd10.;
cards;
BATS 01/02/2014 70
BATS 03/02/2014 58
BATS 05/02/2014 67
BATS 06/02/2014 55
VOD 01/02/2014 50
VOD 03/02/2014 57
VOD 05/02/2014 64
VOD 06/02/2014 58
VOD 08/02/2014 64
VOD 10/02/2014 57
;;;;

Now just merge by RIC and DATE and find the matching records. 现在,按RIC和DATE合并,找到匹配的记录。

data want ;
  do trading_day=1 by 1 until (last.ric);
    merge data1 (in=in1) data2;
    by ric date;
    if in1 then baseday = trading_day;
  end;
  do trading_day=1 by 1 until (last.ric);
    merge data1 (in=in1) data2;
    by ric date;
    if baseday -1 <= trading_day <= baseday+1 then do;
         trading_day_window = trading_day-baseday;
         output;
    end;
  end;
run;
proc print; run;

在此处输入图片说明

您可以在数据步骤中使用保留语句。

A simple proc sql statement can do this, using the between statement in the join. 一个简单的proc sql语句可以使用proc sqlbetween语句来做到这一点。 I've coded +/- 2 days as that seems to be the case in your example data, you can obviously make adjustments to this to comply with whatever rule you use to calculate trading window. 我已经对+/- 2天进行了编码,因为您的示例数据似乎是这种情况,您显然可以对此进行调整,以符合您用于计算交易时段的任何规则。

data data1;
input RIC $ date :ddmmyy10.;
format date date9.;
datalines; 
VOD     03/02/2014
BATS    03/02/2014
;
run;

data data2;
input RIC $ date :ddmmyy10. price;
format date date9.;
datalines;
VOD     01/02/2014         50
VOD     03/02/2014         57
VOD     05/02/2014         64
VOD     06/02/2014         58
VOD     08/02/2014         64
VOD     10/02/2014         57
BATS    01/02/2014         70
BATS    03/02/2014         58
BATS    05/02/2014         67
BATS    06/02/2014         55
;
run;

proc sql;
create table want 
as select 
    b.ric,
    b.date-a.date as trading_day_window,
    b.date,
    b.price
from data1 as a
     inner join
     data2 as b
     on a.ric=b.ric 
     and b.date between a.date-2 and a.date+2;
quit;

You've got some good answers here, so you'll have to play with it to pick the most efficient. 您在这里有一些不错的答案,因此您必须使用它来选择最有效的。

I'm suspecting that your data1 is pretty small. 我怀疑您的data1很小。 If it is, then I think this will be pretty efficient code as it avoids the sorting and potential sql optimizer malarchy. 如果是这样,那么我认为这将是非常有效的代码,因为它避免了排序和潜在的sql优化器错误。 Otherwise, the SQL solution seems the most practical to me. 否则,SQL解决方案对我来说似乎是最实用的。

proc sql noprint;
select count(*)
into :OBSCOUNT
from data1;
quit;

data want(drop=date_ref ric_ref);
set data2;
   do   i = 1 to &obscount.;
    set data1 (rename=(date=date_ref ric=ric_ref)) point=i;
    trading_day_window = (abs(date-date_ref)-1)*sign(date-date_ref);
    if ric=ric_ref
        and -1 <= trading_day_window <= 1
    then output;
   end;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM