简体   繁体   中英

SAS: sum all values except one

I'm working in SAS and I'm trying to sum all observations, leaving out one each time. For example, if I have:

Count    Name      Grade
1        Sam        90
2        Adam       100
3        John       80
4        Max        60
5        Andrea     70

I want to output a value for Sam that is the sum of all grades but his own, and a value for Adam that is a sum of all grades but his own - etc.

Any ideas? Thanks!

You can do it in a single proc sql instead, using key word calculated:

data have;
input Count    Name  $    Grade;
datalines;
1        Sam        90
2        Adam       100
3        John       80
4        Max        60
5        Andrea     70
;;;;
run;

proc sql;
    create table want as
    select *, sum(grade) as all_grades, calculated all_grades-grade as minus_grade
    from have;
quit;
proc sql;
create table temp as select
sum(grade) as all_grades
from orig_data;
quit;

proc sql;
create table temp2 as select
a.count,
a.name,
a.grade,
(b.all_grades-a.grade) as sum_other_grades
from orig_data a
left join temp b;
quit;

Haven't tested it but the above should work. It creates a new dataset temp that has the sum of all grades and merges that back to create a new table with the sum of all grades less the current students grade as sum_other_grades.

Here's a nearly one pass solution (it will be about the same speed as a one pass solution if the dataset fits in the read buffer). I actually calculate the mean here instead of just the sum, as I feel that's a more interesting result (and the sum is of course the mean without the division).

data have;
input Count    Name  $    Grade;
datalines;
1        Sam        90
2        Adam       100
3        John       80
4        Max        60
5        Andrea     70
;;;;
run;

data want;
  retain grademean;
  if _n_=1 then do;
      do _n_ = 1 to nobs_have;
        set have(keep=grade) point=_n_ nobs=nobs_have;
        gradesum+grade;
      end;
      grademean=gradesum/nobs_have;
  end;
  set have;
  grade_noti = ((grademean*nobs_have)-grade)/(nobs_have-1);
run;

Calculate the mean, then for each record subtract the portion that record contributed to the mean. This is a super useful technique for stat testing when you want to compare a record to the rest of the population, and you have a complicated class combination where you'd rather do the mean first. In those cases you use PROC MEANS first and then merge it on, then do this subtraction.

This solution performs takes each observation of your starting dataset, and then loops through the same dataset summing up grade values for any records with different names, so beginning with 'Sam', we only add the oth_g variable when we find names that are NOT 'Sam':

data want;
  set have;
  oth_g=0;
  do i=1 to n;
    set have 
      (keep=name grade rename=(name=name_loop grade=grade_loop)) 
      nobs=n point=i;
    if name^=name_loop then oth_g+grade_loop;
  end;
  drop grade_loop name_loop i n;
run;

This is a slight modification to the answer @Reese provided above.

proc sql;
    create table want as
    select *,
           (select sum(grade) from have) as all_grades,
           calculated all_grades - grade as minus_grade
    from have;
quit;

I've rearranged it this way to avoid the below message being printed to the log:

NOTE: The query requires remerging summary statistics back with the original data.

If you see the above message, it almost always means that you have made a mistake. If you actually did mean to remerge summary stats back with the original data, you should do so explicitly (like I have done above by refactoring @reese 's query.

Personally I think the refactored version is also easier to understand.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM