简体   繁体   English

r-使用汇总与列表进行条件平均

[英]r - Conditional Averaging using Aggregate with lists

I'm trying to write a script to simplify some data analysis, and at some point I need to take averages across some sublists that resemble: 我正在尝试编写脚本来简化一些数据分析,并且在某些时候,我需要对类似于以下内容的一些子列表取平均值:

> temp1[[1]]
      Replicate Week Treatment aaa bbb ccc ddd eee
C1_T0         1    0      Cold   1   2   3   4   5
C2_T0         2    0      Cold   1   2   3   4   5
C3_T0         3    0      Cold   1   2   3   4   5
C4_T0         4    0      Cold   1   2   3   4   5
H1_T0         1    0       Hot   1   2   3   4   5
H2_T0         2    0       Hot   1   2   3   4   5
H3_T0         3    0       Hot   1   2   3   4   5
H4_T0         4    0       Hot   1   2   3   4   5

To do this, I tried to use to aggregate function to take averages of all other columns as a function of the treatment columns, but this only succeeds for the first column, and then returns numbers that are definitely not the mean. 为此,我尝试使用聚合函数将所有其他列的平均值作为处理列的函数,但这仅对第一列成功,然后返回绝对不是均值的数字。

> temp10 <- aggregate( . ~ Treatment, temp1[[1]], mean)
> temp10
  Treatment Replicate Week aaa bbb ccc ddd eee
1      Cold       2.5    1   1   1   1   1   1
2       Hot       2.5    1   1   1   1   1   1

It correctly returns the mean in the replicate column by treatment, but then I'm not quite sure why it's returning something different after that. 它通过处理正确地在复制列中返回了均值,但是我不太确定为什么在此之后它会返回不同的值。 I would guess that this data structure might be incompatible with the mean function, but then I'm not sure why the replicate mean is correct. 我猜想这个数据结构可能与均值函数不兼容,但是我不确定为什么重复均值是正确的。 Is there a better way to do this type of conditional averaging in lists, or would it be better to try to restructure everything as a dataframe? 有没有更好的方法可以在列表中进行这种条件平均,还是尝试将所有内容重新构造为数据框会更好?

Probably your columns are all factors and not numeric , you should always check the class of your data.frame columns before doing calculations like this, because, unfortunately, aggregate won't warn you that it took means of factors (which probably does not make sense at all). 可能您的列是所有factors而不是numeric ,您应该在进行此类计算之前始终检查data.frame列的类,因为不幸的是, aggregate不会警告您它使用了因素(这可能不会感觉)。

To understand what is happening, look at what happens when you convert a factor in numeric: 要了解发生了什么,请查看将数值转换为数值时发生的情况:

as.numeric(as.factor(c(10, 10, 10, 10)))
[1] 1 1 1 1

So, reproducing your problem: 因此,重现您的问题:

df <- read.table(text = "Replicate Week Treatment aaa bbb ccc ddd eee
C1_T0         1    0      Cold   1   2   3   4   5
C2_T0         2    0      Cold   1   2   3   4   5
C3_T0         3    0      Cold   1   2   3   4   5
C4_T0         4    0      Cold   1   2   3   4   5
H1_T0         1    0       Hot   1   2   3   4   5
H2_T0         2    0       Hot   1   2   3   4   5
H3_T0         3    0       Hot   1   2   3   4   5
H4_T0         4    0       Hot   1   2   3   4   5", header = TRUE)

df[-1] <- lapply(df[-1], as.factor)
temp10 <- aggregate( . ~ Treatment, df, mean)
temp10
  Treatment Replicate Week aaa bbb ccc ddd eee
1      Cold       2.5    1   1   1   1   1   1
2       Hot       2.5    1   1   1   1   1   1

Notice that all means are ones because they were factors converted to numeric. 请注意,所有均值都是1,因为它们是转换为数值的因子。 In order to correct this you should transform your columns to numeric in a proper way (for example, using as.numeric(as.character(x)) ) or you should make sure to import your data properly. 为了更正此问题,您应该以适当的方式将列转换为数字(例如,使用as.numeric(as.character(x)) ),或者应确保正确导入数据。 Doing that, aggregate will give you the answer you want: 这样做, aggregate将为您提供所需的答案:

columns <- c("Week", "aaa", "bbb", "ccc", "ddd", "eee")
df[columns] <- lapply(df[columns], function(x) as.numeric(as.character(x)))
temp10 <- aggregate( . ~ Treatment, df, mean)
temp10
  Treatment Replicate Week aaa bbb ccc ddd eee
1      Cold       2.5    0   1   2   3   4   5
2       Hot       2.5    0   1   2   3   4   5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM