简体   繁体   English

如何在某些条件下使用data.table,使用R进行聚合来计算不同列的均值和中位数

[英]How to calculate mean and median of different columns under some conditions using data.table, aggregation with R

I have four vectors (columns) 我有四个向量(列)

 x y z  t
 1 1 1 10
 1 1 1 15
 1 4 1 14
 2 3 1 15
 2 2 1 17
 2 1 2 19
 2 4 2 18
 2 4 2 NA
 2 2 2 45
 3 3 2 NA
 3 1 3 59
 4 3 3 23
 4 4 3 45
 4 4 4 74
 5 1 4 86

I know how to calculate the mean and median of vector t for each value from x,y, and z. 我知道如何计算x,y和z中每个值的向量t的平均值和中位数。 The example is: 示例是:

   bar <- data.table(expand.grid(x=unique(data[x %in% c(1,2,3,4,5),x]),
                                 y=unique(data[y %in% c(1,2,3,4),y]),
                                 z=unique(data[z %in% c(1,2,3,4),z])))
   foo <- data[z %in% c(1,2,3,4),list(
    mean.t=mean(t,na.rm=T),
    median.t=median(t,na.rm=T))
   ,by=list(x,y,z)]
   merge(bar[,list(x,y,z)],foo,by=c("x","y","z"),all.x=T)

The result is: 结果是:

     x y z mean.t median.t
  1: 1 1 1   12.5     12.5
  2: 1 1 2     NA       NA
  3: 1 1 3     NA       NA
  4: 1 1 4     NA       NA
  5: 1 2 1     NA       NA
  ........................
  79: 5 4 3    NA       NA
  80: 5 4 4    NA       NA

But now I have the question: how to do the same calculations for x,y,z and t but for z not as numbers from 1 to 4, but for groups like: 但是现在我有一个问题:如何对x,y,z和t进行相同的计算,但对于z不是从1到4的数字,而是针对像这样的组:

  if 0 < z <= 2 group I, 
  if 2 < z <= 3 group II and 
  if 3 < z <= 4 group III.

So, the output should be in format like: 因此,输出应采用以下格式:

     x y z    mean.t median.t
  1: 1 1 I   
  2: 1 1 II     
  3: 1 1 III     
  4: 1 2 I     
  5: 1 2 II     
  6: 1 2 III     
  7: 1 3 I     
  8: 1 3 II     
  9: 1 3 III     
 10: 1 4 I  
 ..........

Define a new column, zGroup to group by. 定义一个新列zGroup进行分组。

(The data in this example is a little different than yours) (此示例中的数据与您的数据有些不同)

#create some data
dt<-data.table(x=rep(c(1,2),each=4),
               y=rep(c(1,2),each=2,times=2),
               z=rep(c(1,2,3,4),times=2),t=1:8)

#add a zGroup column
dt[0<z & z<=2, zGroup:=1]
dt[2<z & z<=3, zGroup:=2]
dt[3<z & z<=4, zGroup:=3]

#group by unique combinations of x, y, zGroup taking mean and median of t
dt[,list(mean.t=mean(t), median.t=as.double(median(t))), by=list(x,y,zGroup)]

Note, this will error without coercing the median to a double. 请注意,这将在不将中位数强制为两倍的情况下发生错误。 See this post for details. 有关详细信息,请参见此帖子

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM