简体   繁体   中英

Cumulative sum of columns in SAS

I am trying to sum across variables

N1 N2 N3
1  1  1
1  .  1
1  1  .

Want

N1 N2 N3 B1 B2 B3
1  1  1  1  2  3
1  .  1  1  1  2
1  1  .  1  2  2

The array that I am trying looks like not working at all..

data temp2; 
    set temp; 
    array hh(*) N:; 
    array bb(3); 
    do i=1 to dim(hh);
        bb(i)=bb(i)+hh(i+1);
    end; 
run;

I dont want to use transpose and cumulate the sum.

First, you have an error in your algorithm : a cumulated value

  • should not be calculated from itself an the next input value,
  • but from the previous cumulated value and the corresponding input value.

You rather need bb(i) = bb(i-1) + hh(i) . Of course, this does not work when i is 1 because there is no hh(0) , so you start doing this from i = 2 on.

Second, you need to handle missing values , something like a coalesce in SQL. Let us use ifn for that, a function that returns a numeric variable. The first argument is a condition, the second is the return value if the condition is true, the third the value the returnvale if the condition is false.

Putting it all together ;

data AFTER;
    set TEMP;
    array hh(*) N:; 
    array bb(3); 

    bb1 = ifn(missing(N1), 0, N1);
    do i=2 to 3;
        bb(i) = bb(i-1) + ifn(missing(hh(i)), 0, hh(i));
    end; 

    drop i;
run; 

The down side of this sollution is the hardcoded 3 in array bb(3) and do i=2 to 3 , which user3658367 tryed to solve with dim(hh) . Unfortunately, that only works for one of them.

So this is better ;

proc sql; 
    select count(*) 
      into :B_count 
      from sasHelp.vColumn 
     where libname eq 'WORK'
       and memName eq 'TEMP'
       and name like 'N%';
quit;

data AFTER;
    set TEMP;
    array hh(*) N:; 
    array bb(&B_count); 

    bb1 = ifn(missing(N1), 0, N1);
    do i=2 to &B_count;
        bb(i) = bb(i-1) + ifn(missing(hh(i)), 0, hh(i));
    end; 

    drop i;
run; 

I added the condition name like 'N%' because I assume in your real live problem TEMP has other variable than the once you are cummulating.

About the comments below : if you were not involved in this post from the start, you can neglect them. I included them in the text above.

(To the autors of these comments: thanks for your input.)

I am not very comfortable with arrays, so I would use a macro to do the work.

data temp;
input N1 N2 N3;
datalines;
1 1 1
1 . 1
1 1 .
;
run;

options mprint mlogic;

proc sql;
select name into:cols separated by ','
from dictionary.columns
where libname = upcase("work") and memname = upcase("temp") and upcase(name) like 'N%';
quit;

%macro cumul_colsum;

data temp2;
set temp;
run;

%do i = 1 %to %sysfunc(countw(%superq(cols)));

%let var = %scan(%superq(cols),&i,%str(,)); %put |&var|;

data temp2;
set temp2;
B&i. = sum(of N1-&var.);
run;

%end;
%mend cumul_colsum; %cumul_colsum;

And get the desired result, which in this case is the one OP requires. I have used the like 'N%' same as Dirk to feed the column names to a macro, and a do loop to create the columns with cumulative sums. This might take a while for huge datasets. But it's just easier to understand ( options mprint nonotes; ) what's happening.

follow @Reeza suggestion of using sum instead of + for handing missing values. something like below.

data have;
input
N1 N2 N3;
datalines;
1  1  1
1  .  1
1  1  .
;

data temp2; 
set have; 
array hh(*) N1 N2 N3; 

array bb(3) b1 b2 B3; 
do i= 1 to dim(hh);
if i= 1 then bb(i) = hh(i);
 else   bb(i)= sum(bb(i-1),hh(i));
end; 
drop i;

run;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM