简体   繁体   English

SAS 循环遍历 56 个字符串每两个字符提取一次

[英]SAS loop through 56 character string extract every two characters

Have a couple million records with a string like有几百万条带有类似字符串的记录
"00 00 01 00 00 01 00 01 00 00 00 00 01 01 00 01 00 00 00 00 01" “00 00 01 00 00 01 00 01 00 00 00 00 01 01 00 01 00 00 00 00 01”
String has a length of 56. All positions are filled with either a 0 or a 1.字符串的长度为 56。所有位置都用 0 或 1 填充。
My job is parse the string of each record every two positions我的工作是每两个位置解析每条记录的字符串
(there are no spaces, that is just for clarification). (没有空格,这只是为了澄清)。

If there is a 1 in position two that means increment var1 +1如果 position 中有一个 1,则 2 表示增加 var1 +1
If there is ALSO a 1 in position four, (don't care about leading "0"'s如果 position 四个中还有一个 1,(不要关心前面的“0”
in position 1/3/5/9...55, etc.) increment var2 + 1, up to 28 variables.在 position 1/3/5/9...55 等)增量 var2 + 1,最多 28 个变量。

The entire 56 len string must be parsed every two characters.整个 56 len 字符串必须每两个字符解析一次。 Potentially潜在地
there could be 28 variables that have to be incremented, (but not realistic,可能有 28 个变量需要增加,(但不现实,
most likely there is only five or six) which could be found in any part of the很可能只有五六个)可以在任何部分找到
string, beginning to end (as long as they are in position 2/4/6/8 up to 56, etc.)字符串,从头到尾(只要它们在 position 2/4/6/8 到 56 等)

This is what my boss gave me:这是我老板给我的:
if substr(BigString,2,1)='1' then var1+1;如果 substr(BigString,2,1)='1' 那么 var1+1;

OK.好的。 Fine.美好的。
A) There are 27 more places to evaluate in the string. A) 字符串中还有 27 个需要评估的地方。
B) there are a couple million records. B)有几百万条记录。

28 nested if then do loops doesn't sound like an answer (all I could think of). 28 个嵌套的 if then do 循环听起来不像是一个答案(我能想到的)。 At least not to me.至少对我来说不是。
Thanx.谢谢。

if I understood the problem well, this could be the solution: EDITED 2. solution:如果我很好地理解了这个问题,这可能是解决方案: EDITED 2. 解决方案:

/* example with same row*/
data test;
a="00000100000100010000000001010001000000000100000000011110";output;
a="10000100000100010000000001010001000000000100011100011101";output;
a="01000100000100010000000001010001000000000100000001000000";output;
a="10100100000100010000000001010001000000000111111111111110";output;
a="01100100000100010000000001010001000000000101010101010101";output;
a="00000100000100010000000001010001000000000100001100101010";output;
run;

/* work by rows*/
%macro x;
%let i=1;
data test_output(drop=i);
 set test;
    i=1;
    %do %while (&i<=56);
        var&i.=0;
        var&i.=var&i.+input(substr(a,&i,1), best8.); 
        %let i=%eval(&i.+1);
    %end;
run;
%mend;
%x;

/* results:
a                                                          var1 var2 var3 var4 var5 var6 var7   .   .
00000100000100010000000001010001000000000100000000011110    0   0   0   0   0   1   0    .......    
10000100000100010000000001010001000000000100011100011101    1   0   0   0   0   1   0    .......    
01000100000100010000000001010001000000000100000001000000    0   1   0   0   0   1   0    .......    
10100100000100010000000001010001000000000111111111111110    1   0   1   0   0   1   0    .......    
01100100000100010000000001010001000000000101010101010101    0   1   1   0   0   1   0    .......    
00000100000100010000000001010001000000000100001100101010    0   0   0   0   0   1   0    .......    
*/

I think the author is trying to look for an do-loop method.我认为作者正在尝试寻找一种do-loop方法。 So my suggest is macro %do or array statment in data step.所以我的建议是数据步骤中的宏%doarray语句。

data _null_;
    text = '000001000001000100000000010100010000000001';

    y = length(text);
    array Var[28];
    do i = 1 to dim(Var);
        Var[i] + (substrn(text,i*2,1)='1');
        put i = Var[i]=;
    end;
run;

Kind of easy, isn't is?有点容易,不是吗?

Array the variables that are to be potentially incremented according to string.根据字符串排列可能增加的变量。 A DO loop can examine each part of the string and conditionally apply the needed increment. DO循环可以检查字符串的每个部分并有条件地应用所需的增量。

The SUM statement <variable>+<expression> means the variable's value is automatically retained from row to row. SUM语句<variable>+<expression>表示变量的值会自动逐行保留。

Due to the nature of retain ed variables, you might want only the final var1-var28 values at the last row in the data.由于retain变量的性质,您可能只需要数据最后一行的最终var1-var28值。 The question does not have enough info regarding what is to be done with the var<n> variables.该问题没有足够的信息来说明如何处理var<n>变量。

Example:例子:

Presume string is named op_string (op for operation).假定字符串被命名为op_string (操作为 op)。 Utilize logical evaluation result True is 1 and False is 0利用逻辑评估结果True1 False0

data want(keep=var1-var28); 
  set have end=done;
  array var var1-var28;
  do index = 1 to 28;
    var(index) + substr(op_string, 2 * index) = '1';  * Add 0 or 1 according to logic eval;
  end;
  if done;  * output one row at the end of the data set;
run;

Use COUNTC() to count the number of 1's in the string then.然后使用 COUNTC() 计算字符串中 1 的数量。

data want;
set have;
value = countc(op_string, '1');
run;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM