aggregate(df, …) returning NAs?

Question

I would like to apply the aggregate function on this data frame by the variables "id" and "var1"

df <- structure(list (id = c(1L,1L,1L,1L,2L,2L,2L,2L),
        var1 = structure(c(1L,1L,2L,2L,1L,1L,2L,2L),
          .Label = c("A", "B"), class = "factor"), 
        var2 = c(1L,2L,1L,2L,1L,2L,1L,2L),
        values = c(37L,20L,22L,18L,30L,5L,41L,50L)),
        .Names = c("id","var1","var2","values"),
        class = "data.frame", row.names = c(NA,-8L))

# looks like
> df
  id var1 var2 values
1  1    A    1     37
2  1    A    2     20
3  1    B    1     22
4  1    B    2     18
5  2    A    1     30
6  2    A    2      5
7  2    B    1     41
8  2    B    2     50

However if I do this I have a lot of warnings and a column full of NAs

> agg <- aggregate(df, by=list(df$id, df$var1), mean)
Warning messages:
1: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
3: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
4: In mean.default(X[[i]], ...) :
  argument is not numeric or logical: returning NA
> agg
  Group.1 Group.2 id var1 var2 values
1       1       A  1   NA  1.5   28.5
2       2       A  2   NA  1.5   17.5
3       1       B  1   NA  1.5   20.0
4       2       B  2   NA  1.5   45.5

Is there a way to prevent these warnings? has my aggregate result lost some data due to these?

Answer 1

Try this

aggregate( . ~ id + var1 , data = df, mean)

#  id var1 var2 values
#1  1    A  1.5   28.5
#2  2    A  1.5   17.5
#3  1    B  1.5   20.0
#4  2    B  1.5   45.5

Here are some other options

Using dplyr

library(dplyr)
df %>% group_by(id, var1) %>% summarize(var2 = mean(var2), values = mean(values))
#or simply
df %>% group_by(id, var1) %>% summarise_each(funs(mean))

#Source: local data frame [4 x 4]
#Groups: id
#  id var1 var2 values
#1  1    A  1.5   28.5
#2  2    A  1.5   17.5
#3  1    B  1.5   20.0
#4  2    B  1.5   45.5

Using data.table , you have two options:

library(data.table)
setDT(df)[, .(var2 = mean(var2), values = mean(values)), by = .(id, var1)] # option 1
setDT(df)[, lapply(.SD, mean), by=.(id,var1), .SDcols=c("var2","values")] # option 2

#   id var1 var2 values
#1:  1    A  1.5   28.5
#2:  1    B  1.5   20.0
#3:  2    A  1.5   17.5
#4:  2    B  1.5   45.5

Using ddply

library(plyr)
ddply(df, .(id,var1), colwise(mean))

#  id var1 var2 values
#1  1    A  1.5   28.5
#2  1    B  1.5   20.0
#3  2    A  1.5   17.5
#4  2    B  1.5   45.5

Answer 2

You need to limit the data frame provided for argument x to the columns you want FUN to be applied to. So in your example, you want to apply the mean function to the values column, grouped by id and var1 , hence you need to specify df$values instead of just df :

agg <- aggregate(df$values, by=list(df$id, df$var1), mean)

Answer 3

Because your first argument (data=df, ...) asked it to aggregate over all the df's columns (not just the single column values ).

You want (data=df$values,... .

Or use the formula interface as others have said.

aggregate(df, …) returning NAs?

Question

3 answers

solution1
2 ACCPTED 2015-07-25 09:52:30

solution2
1 2018-02-27 12:40:02

solution3
0 2018-05-04 07:45:42

aggregate(df, …) returning NAs?

Question

3 answers

solution1 2 ACCPTED 2015-07-25 09:52:30

solution2 1 2018-02-27 12:40:02

solution3 0 2018-05-04 07:45:42

solution1
2 ACCPTED 2015-07-25 09:52:30

solution2
1 2018-02-27 12:40:02

solution3
0 2018-05-04 07:45:42