[英]R data.table: How to calculate ratio of sum of values, for columns from vector, within the group, vs. the rest of the values in the columns in table?
Let say we have a DT:假设我们有一个 DT:
A![]() |
B![]() |
C ![]() |
---|---|---|
1 ![]() |
1 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
2 ![]() |
3 ![]() |
5 ![]() |
2 ![]() |
4 ![]() |
7 ![]() |
3 ![]() |
5 ![]() |
9 ![]() |
3 ![]() |
6 ![]() |
11 ![]() |
Let's say we want to obtain the ratio of mean value of B and C within each group A c(1,2,3) to mean values of B and C but for all other groups A (=exluding the group A)...假设我们想要获得每个组 A c(1,2,3) 中 B 和 C 的平均值与 B 和 C 的平均值之比,但对于所有其他组 A(=排除 A 组)...
Let's say the "columns to calculate mean" = _vars = c(B,C).假设“计算平均值的列”= _vars = c(B,C)。
How to do it automatically for all column names in a given vector, ie c("B","C")如何为给定向量中的所有列名自动执行此操作,即 c("B","C")
If I understood your question correctly:如果我正确理解了您的问题:
cols<-c('A','C')
cols.remaining <- setdiff(colnames(dt),cols)
global.means <- paste0("GlobalMean",cols.remaining)
group.means <- paste0("GroupMean",cols.remaining)
group.ratios <- paste0("GroupRatio",cols.remaining)
dt[, (global.means):= lapply(.SD, mean) ,.SDcols=c(cols.remaining)][
,(group.means) := lapply(.SD, mean), by = cols,.SDcols=cols.remaining][
,(group.ratios):= mapply(group.means,global.means,FUN = function(m,g) get(m)/get(g),SIMPLIFY = F)][]
A B C GlobalMeanB GroupMeanB GroupRatioB
1: 1 1 1 3.5 1 0.2857143
2: 1 2 3 3.5 2 0.5714286
3: 2 3 5 3.5 3 0.8571429
4: 2 4 7 3.5 4 1.1428571
5: 3 5 9 3.5 5 1.4285714
6: 3 6 11 3.5 6 1.7142857
and with cols <- c('B')
:并使用
cols <- c('B')
:
A B C GlobalMeanA GlobalMeanC GroupMeanA GroupMeanC GroupRatioA GroupRatioC
1: 1 1 1 2 6 1 1 0.5 0.1666667
2: 1 2 3 2 6 1 3 0.5 0.5000000
3: 2 3 5 2 6 2 5 1.0 0.8333333
4: 2 4 7 2 6 2 7 1.0 1.1666667
5: 3 5 9 2 6 3 9 1.5 1.5000000
6: 3 6 11 2 6 3 11 1.5 1.8333333
6: 3 11 0.28571429
Here is one possible way to solve your problem.这是解决您的问题的一种可能方法。
# ratios of means related to column B and C grouped by A
cols = c("B", "C")
DT[, Map(`/`, lapply(.SD, mean), lapply(DT[-.I, cols, with=FALSE], mean)), by=.(A), .SDcols=cols]
# A B C
# 1: 1 0.3333333 0.25
# 2: 2 1.0000000 1.00
# 3: 3 2.2000000 2.50
# alternative solution (gives the same result)
DT[, Map(`/`, lapply(.SD, mean), lapply(DT[!.BY, cols, with=FALSE, on=.(A)], mean)), by=.(A), .SDcols=cols]
lapply(.SD, mean)
computes the groups' means. lapply(.SD, mean)
计算组的平均值。
lapply(DT[-.I, cols, with=FALSE], mean)
: computes the means excluding the current group. lapply(DT[-.I, cols, with=FALSE], mean)
:计算不包括当前组的均值。
Map
function then uses the division operator, /
, to compute the ratio between the groups' means (calculated by lapply(.SD, mean)
) and the means excluding the current group (calculated by lapply(DT[-.I, cols, with=FALSE], mean)
) element-wise.然后,
Map
函数使用除法运算符/
来计算组的平均值(由lapply(.SD, mean)
计算)与不包括当前组的平均值(由lapply(DT[-.I, cols, with=FALSE], mean)
) 元素。
For other scenarios, you just adapt the .SDcols
and by
arguments in an appropriate way.对于其他情况,您只需以适当的方式调整
.SDcols
和by
参数。
# ratios of means related to column B grouped by A and C.
cols = "B"
DT[, Map(`/`, lapply(.SD, mean), lapply(DT[-.I, cols, with=FALSE], mean)), by=.(A, C), .SDcols=cols]
# A C B
# 1: 1 1 0.2500000
# 2: 1 3 0.5263158
# 3: 2 5 0.8333333
# 4: 2 7 1.1764706
# 5: 3 9 1.5625000
# 6: 3 11 2.0000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.