简体   繁体   中英

How can I get the first and the last MISSING value from a particular ROW using SAS

Have the following problem, I want to identify the first and the last missing values in a row. Take as an example the following code:

data example;
  input id var1 var2 var3 var4 var5 var6 var7 var8 var9 var10 var11 var12;
cards;

A   1 2 3 . . . . . 1 1 1 3
B   3 3 2 1 3 2 1 . . . . .
C   . . . . 1 2 3 1 2 3 2 .
D   3 . 1 . 3 . 1 . 3 . 1 .
F   1 3 . . 1 3 . . 1 3 . .
E   3 2 1 . . . . . 1 1 1 3
G   3 3 2 1 3 2 1 . . . . .
H   . . . . . 1 2 3 1 2 3 2
I   3 . 1 . 3 . 1 . 3 . 1 .
J   A E . . A E . . A E . . 
;

In row A the first is var4 and the last var8

In row D the first is var2 and the last is var12

Thank you.

Seems pretty simple using an ARRAY and a couple of DO loops.

Let's clean up your data step and add an example with no missing values.

missing abcdefghijklmnopqrstuvwxyz;
data example;
  input id $ var1-var12;
cards;
A   1 2 3 . . . . . 1 1 1 3
B   3 3 2 1 3 2 1 . . . . .
C   . . . . 1 2 3 1 2 3 2 .
D   3 . 1 . 3 . 1 . 3 . 1 .
F   1 3 . . 1 3 . . 1 3 . .
E   3 2 1 . . . . . 1 1 1 3
G   3 3 2 1 3 2 1 . . . . .
H   . . . . . 1 2 3 1 2 3 2
I   3 . 1 . 3 . 1 . 3 . 1 .
J   A E . . A E . . A E . . 
K   1 2 3 4 5 6 7 8 9 10 11 12
;

So then in a data step create an array of the variable you want to check (in the order you want them checked). Then use two do loops. When counting up make sure to trap the case when none is found. By default to result will be N+1, so perhaps you want it do be zero instead like the result you get when counting down.

data want;
  set example;
  array vars var1-var12;
  do first=1 to 12 while(not missing(vars[first])); end;
  if first>12 then first=0;
  do last=12 to 1 by -1 while(not missing(vars[last])); end;
run;

Results;

在此处输入图片说明

Concatenate all of your values together into a string. Find the position of the first . and last . in the string.

data want;
    set example;

    sequence_char = cats(of var1-var12);

    missing_start = find(sequence_char, '.');
    missing_end   = length(sequence_char) - find(strip(reverse(sequence_char)), '.') + 1;
    
run;

Output:

id  sequence_char   missing_start   missing_end
A   123.....1113    4               8
B   3321321.....    8               12
C   ....1231232.    1               12
D   3.1.3.1.3.1.    2               12
F   13..13..13..    3               12
E   321.....1113    4               8
G   3321321.....    8               12
H   .....1231232    1               5
I   3.1.3.1.3.1.    2               12
J   ............    1               12

I assume here that the letters are NOT meant to be counted as missing. If they are, then replace "if v(i) = ." with "if missing(v(i))".

data want (drop=i);
  set have;
  array v(12) var1-var12;
  do i = 1 to 12;
    if v(i) = . then do;
      first_missing = min(first_missing,i);
      last_missing = max(last_missing,i);
    end;
  end;
run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM