R data.table：如何计算组内向量列的值总和与表中列中其余值的比率？

Question

Let say we have a DT:假设我们有一个 DT：

A一个	B乙	C C
1 1	1 1	1 1
1 1	2 2	3 3
2 2	3 3	5 5
2 2	4 4	7 7
3 3	5 5	9 9
3 3	6 6	11 11

Let's say we want to obtain the ratio of mean value of B and C within each group A c(1,2,3) to mean values of B and C but for all other groups A (=exluding the group A)...假设我们想要获得每个组 A c(1,2,3) 中 B 和 C 的平均值与 B 和 C 的平均值之比，但对于所有其他组 A（=排除 A 组）...

Let's say the "columns to calculate mean" = _vars = c(B,C).假设“计算平均值的列”= _vars = c(B,C)。

What if I am grouping by A and C and _vars = c(B) only ? 如果我仅按 A 和 C 和 _vars = c(B) 分组怎么办？
Update: 更新：

How to do it automatically for all column names in a given vector, ie c("B","C")如何为给定向量中的所有列名自动执行此操作，即 c("B","C")

Answer 1

If I understood your question correctly:如果我正确理解了您的问题：

cols<-c('A','C')
cols.remaining <- setdiff(colnames(dt),cols) 

global.means <- paste0("GlobalMean",cols.remaining)
group.means <- paste0("GroupMean",cols.remaining)
group.ratios <- paste0("GroupRatio",cols.remaining)


dt[, (global.means):= lapply(.SD, mean) ,.SDcols=c(cols.remaining)][
   ,(group.means) := lapply(.SD, mean), by = cols,.SDcols=cols.remaining][
   ,(group.ratios):= mapply(group.means,global.means,FUN = function(m,g) get(m)/get(g),SIMPLIFY = F)][]

   A B  C GlobalMeanB GroupMeanB GroupRatioB
1: 1 1  1         3.5          1   0.2857143
2: 1 2  3         3.5          2   0.5714286
3: 2 3  5         3.5          3   0.8571429
4: 2 4  7         3.5          4   1.1428571
5: 3 5  9         3.5          5   1.4285714
6: 3 6 11         3.5          6   1.7142857

and with cols <- c('B') :并使用cols <- c('B') ：

   A B  C GlobalMeanA GlobalMeanC GroupMeanA GroupMeanC GroupRatioA GroupRatioC
1: 1 1  1           2           6          1          1         0.5   0.1666667
2: 1 2  3           2           6          1          3         0.5   0.5000000
3: 2 3  5           2           6          2          5         1.0   0.8333333
4: 2 4  7           2           6          2          7         1.0   1.1666667
5: 3 5  9           2           6          3          9         1.5   1.5000000
6: 3 6 11           2           6          3         11         1.5   1.8333333
6: 3 11 0.28571429

Answer 2

Here is one possible way to solve your problem.这是解决您的问题的一种可能方法。

# ratios of means related to column B and C grouped by A
cols = c("B", "C")
DT[, Map(`/`, lapply(.SD, mean), lapply(DT[-.I, cols, with=FALSE], mean)), by=.(A), .SDcols=cols]
#        A         B     C
# 1:     1 0.3333333  0.25
# 2:     2 1.0000000  1.00
# 3:     3 2.2000000  2.50

# alternative solution (gives the same result)
DT[, Map(`/`, lapply(.SD, mean), lapply(DT[!.BY, cols, with=FALSE, on=.(A)], mean)), by=.(A), .SDcols=cols]

lapply(.SD, mean) computes the groups' means. lapply(.SD, mean)计算组的平均值。
lapply(DT[-.I, cols, with=FALSE], mean) : computes the means excluding the current group. lapply(DT[-.I, cols, with=FALSE], mean) ：计算不包括当前组的均值。
Map function then uses the division operator, / , to compute the ratio between the groups' means (calculated by lapply(.SD, mean) ) and the means excluding the current group (calculated by lapply(DT[-.I, cols, with=FALSE], mean) ) element-wise.然后， Map函数使用除法运算符/来计算组的平均值（由lapply(.SD, mean)计算）与不包括当前组的平均值（由lapply(DT[-.I, cols, with=FALSE], mean) ) 元素。

For other scenarios, you just adapt the .SDcols and by arguments in an appropriate way.对于其他情况，您只需以适当的方式调整.SDcols和by参数。

# ratios of means related to column B grouped by A and C.
cols = "B"
DT[, Map(`/`, lapply(.SD, mean), lapply(DT[-.I, cols, with=FALSE], mean)), by=.(A, C), .SDcols=cols]
#        A     C         B
# 1:     1     1 0.2500000
# 2:     1     3 0.5263158
# 3:     2     5 0.8333333
# 4:     2     7 1.1764706
# 5:     3     9 1.5625000
# 6:     3    11 2.0000000

R data.table：如何计算组内向量列的值总和与表中列中其余值的比率？

问题描述

2 个解决方案

解决方案1
0 2022-06-11 05:51:34

解决方案2
0 2022-06-13 22:58:08

R data.table：如何计算组内向量列的值总和与表中列中其余值的比率？

问题描述

2 个解决方案

解决方案1 0 2022-06-11 05:51:34

解决方案2 0 2022-06-13 22:58:08

解决方案1
0 2022-06-11 05:51:34

解决方案2
0 2022-06-13 22:58:08