简体   繁体   中英

Aggregate NAs in R

I'm having trouble handling NAs while calculating aggregated means. Please see the following code:

tab=data.frame(a=c(1:3,1:3), b=c(1,2,NA,3,NA,NA))
tab
  a  b
1 1  1
2 2  2
3 3 NA
4 1  3
5 2 NA
6 3 NA

attach(tab)
aggregate(b, by=list(a), data=tab, FUN=mean, na.rm=TRUE)
  Group.1   x
1       1   2
2       2   2
3       3 NaN

I want NA instead of NaN if the vector has all NAs ie I want the output to be

  Group.1   x
1       1   2
2       2   2
3       3  NA

I tried using a custom function:

adjmean=function(x) {if(all(is.na(x))) NA else mean(x,na.rm=TRUE)}

However, I get the following error:

aggregate(b, by=list(a), data=tab, FUN=adjmean)

Error in FUN(X[[1L]], ...) : 
  unused argument (data = list(a = c(1, 2, 3, 1, 2, 3), b = c(1, 2, NA, 3, NA, NA)))

In short, if the column has all NAs I want NA as an output instead of NaN. If it has few NAs, then it should compute the mean ignoring the NAs.

Any help would be appreciated.

Thanks

This is very close to what you had, but replaces mean(x, na.rm=TRUE) with a custom function which either computes the mean of the non-NA values, or supplies NA itself:

R> with(tab, 
        aggregate(b, by=list(a), FUN=function(x) 
             if (any(is.finite(z<-na.omit(x)))) mean(z) else NA))
  Group.1  x
1       1  2
2       2  2
3       3 NA
R> 

That is really one line, but I broke it up to make it fit into the SO display.

And you already had a similar idea, but I altered the function a bit more to return suitable values in all cases.

There is nothing wrong with your function. What is wrong is that you are using an argument in the default method for aggregate that doesn't exist:

adjmean = function(x) {if(all(is.na(x))) NA else mean(x,na.rm=TRUE)}
attach(tab)  ## Just because you did it. I don't recommend this.

## Your error
aggregate(b, by=list(a), data=tab, FUN=adjmean)
# Error in FUN(X[[i]], ...) : 
#   unused argument (data = list(a = c(1, 2, 3, 1, 2, 3), b = c(1, 2, NA, 3, NA, NA)))

## Dropping the "data" argument
aggregate(b, list(a), FUN = adjmean)
#   Group.1  x
# 1       1  2
# 2       2  2
# 3       3 NA

If you wanted to use the data argument, you should use the formula method for aggregate . However, this method treats NA differently, so you need an additional argument, na.action .

Example:

detach(tab) ## I don't like having things attached
aggregate(b ~ a, data = tab, adjmean)
#   a b
# 1 1 2
# 2 2 2
aggregate(b ~ a, data = tab, adjmean, na.action = na.pass)
#   a  b
# 1 1  2
# 2 2  2
# 3 3 NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM