简体   繁体   中英

r - Conditional Averaging using Aggregate with lists

I'm trying to write a script to simplify some data analysis, and at some point I need to take averages across some sublists that resemble:

> temp1[[1]]
      Replicate Week Treatment aaa bbb ccc ddd eee
C1_T0         1    0      Cold   1   2   3   4   5
C2_T0         2    0      Cold   1   2   3   4   5
C3_T0         3    0      Cold   1   2   3   4   5
C4_T0         4    0      Cold   1   2   3   4   5
H1_T0         1    0       Hot   1   2   3   4   5
H2_T0         2    0       Hot   1   2   3   4   5
H3_T0         3    0       Hot   1   2   3   4   5
H4_T0         4    0       Hot   1   2   3   4   5

To do this, I tried to use to aggregate function to take averages of all other columns as a function of the treatment columns, but this only succeeds for the first column, and then returns numbers that are definitely not the mean.

> temp10 <- aggregate( . ~ Treatment, temp1[[1]], mean)
> temp10
  Treatment Replicate Week aaa bbb ccc ddd eee
1      Cold       2.5    1   1   1   1   1   1
2       Hot       2.5    1   1   1   1   1   1

It correctly returns the mean in the replicate column by treatment, but then I'm not quite sure why it's returning something different after that. I would guess that this data structure might be incompatible with the mean function, but then I'm not sure why the replicate mean is correct. Is there a better way to do this type of conditional averaging in lists, or would it be better to try to restructure everything as a dataframe?

Probably your columns are all factors and not numeric , you should always check the class of your data.frame columns before doing calculations like this, because, unfortunately, aggregate won't warn you that it took means of factors (which probably does not make sense at all).

To understand what is happening, look at what happens when you convert a factor in numeric:

as.numeric(as.factor(c(10, 10, 10, 10)))
[1] 1 1 1 1

So, reproducing your problem:

df <- read.table(text = "Replicate Week Treatment aaa bbb ccc ddd eee
C1_T0         1    0      Cold   1   2   3   4   5
C2_T0         2    0      Cold   1   2   3   4   5
C3_T0         3    0      Cold   1   2   3   4   5
C4_T0         4    0      Cold   1   2   3   4   5
H1_T0         1    0       Hot   1   2   3   4   5
H2_T0         2    0       Hot   1   2   3   4   5
H3_T0         3    0       Hot   1   2   3   4   5
H4_T0         4    0       Hot   1   2   3   4   5", header = TRUE)

df[-1] <- lapply(df[-1], as.factor)
temp10 <- aggregate( . ~ Treatment, df, mean)
temp10
  Treatment Replicate Week aaa bbb ccc ddd eee
1      Cold       2.5    1   1   1   1   1   1
2       Hot       2.5    1   1   1   1   1   1

Notice that all means are ones because they were factors converted to numeric. In order to correct this you should transform your columns to numeric in a proper way (for example, using as.numeric(as.character(x)) ) or you should make sure to import your data properly. Doing that, aggregate will give you the answer you want:

columns <- c("Week", "aaa", "bbb", "ccc", "ddd", "eee")
df[columns] <- lapply(df[columns], function(x) as.numeric(as.character(x)))
temp10 <- aggregate( . ~ Treatment, df, mean)
temp10
  Treatment Replicate Week aaa bbb ccc ddd eee
1      Cold       2.5    0   1   2   3   4   5
2       Hot       2.5    0   1   2   3   4   5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM