简体   繁体   English

基于每个变量的 SAS 查找

[英]SAS Lookup on Per Variable Basis

I have two tables in SAS, Table A and Table B. Suppose I want to write a little SAS code to obtain the table "Desired Output."我在 SAS 中有两个表,表 A 和表 B。假设我想编写一些 SAS 代码来获取表“所需输出”。 How would I do this?我该怎么做?

Table A:表一:

Observation  Var1   Var2
1            0      0
2            1      2
3            2      1
4            0      0

Table B:表 B:

Var     Level   Lookup
Var1    0       0.1
Var1    1       0.3
Var1    2       0.5
Var2    0       0.7
Var2    1       0.8
Var2    2       0.9

Desired output:期望的输出:

Observation Var1    Var2    Var1_new    Var2_new
1           0       0       0.1         0.7
2           1       2       0.3         0.9
3           2       1       0.5         0.8
4           0       2       0.1         0.9

From my understanding, this may involve SQL in SAS, but I'm not sure.根据我的理解,这可能涉及 SAS 中的 SQL,但我不确定。 I have no idea how to do this.我不知道该怎么做。 Pseudo-code might look like this, but I don't know how to actually make it work:伪代码可能看起来像这样,但我不知道如何实际使其工作:

data DATA_OUT.DESIRED_OUTPUT;
set DATA_IN.TABLE_A;
set PP.TABLE_B key=(Var Level);

Var1_new = TABLE_B["Var1" Var1][Lookup];
Var2_new = TABLE_B["Var2" Var2][Lookup];
run;

How would you achieve the desired output in SAS?您将如何在 SAS 中实现所需的输出?

There's about a dozen ways to do this, but the best way for what you have there is probably to make a format from the second dataset.大约有十几种方法可以做到这一点,但最好的方法可能是从第二个数据集制作格式。

Formats are just relationships between one value and another value, which is exactly what you have here!格式只是一个值和另一个值之间的关系,这正是您在这里所拥有的! You use the CNTLIN option on PROC FORMAT to create the relationship from a dataset (your dataset B) and then apply it using PUT.您可以使用 PROC FORMAT 上的 CNTLIN 选项从数据集(您的数据集 B)创建关系,然后使用 PUT 应用它。 (Then use INPUT to change back to a number - formats only create character values. You can't use INFORMAT here because those only take character values as input. Number to number always takes an extra step.) (然后使用 INPUT 改回数字 - 格式仅创建字符值。您不能在此处使用 INFORMAT 因为那些仅将字符值作为输入。数字到数字总是需要一个额外的步骤。)

You could also use a hash table lookup, or just a pair of data step merges, or keyed set statements... a lot of options, as well as SQL joins.您还可以使用哈希表查找,或仅使用一对数据步骤合并,或键控集语句……许多选项以及 SQL 连接。 But format here will be the fastest and the easiest to code IMO.但是这里的格式将是最快和最容易编码 IMO 的。

data a;
input Observation  Var1   Var2;
datalines;
1            0      0
2            1      2
3            2      1
4            0      0
;;;;
run;

data b;
input Var $  Level   Lookup;
datalines;
Var1    0       0.1
Var1    1       0.3
Var1    2       0.5
Var2    0       0.7
Var2    1       0.8
Var2    2       0.9
;;;;
run;

*Here we make a new dataset that has the required names for a format cntlin dataset;
data for_fmt;
  set b;
  rename var=fmtname 
         level=start
         lookup=label
  ;
  var = cats(var,'F');  *format names cannot end with numbers, so add an F at the end;
run;
proc format cntlin=for_fmt;  *read in the format;
quit;

*now use the formats;
data want;
  set a;
  var1_new = input(put(var1,var1f.),best12.);
  var2_new = input(put(var2,var2f.),best12.);
run;

Here is a method using a hash object to store your table B.这是一种使用哈希对象存储表 B 的方法。

data A ;
 input var1 var2;
cards;
0 0
1 2
2 1
0 0
;    
data B;
  input Var :$32. Level Lookup;
cards;
Var1 0 0.1
Var1 1 0.3
Var1 2 0.5
Var2 0 0.7
Var2 1 0.8
Var2 2 0.9
;

data want;
  if _n_=1 then do;
    if 0 then set b;
    dcl hash h(dataset: 'b');
    h.definekey('var','level');
    h.definedata('lookup');
    h.definedone();
  end;
  set a;
  h.find(key:'Var1',key:var1);
  lookup1=lookup;
  h.find(key:'Var2',key:var2);
  lookup2=lookup;
  drop var level lookup;
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM