加入两个data.tables后创建新列

Question

I have two data.tables, main and metrics , both keyed by cid I want to add to table main the average of each of several values located in metrics. 我有两个data.tables， main和metrics ，都由cid键入我想要向表main添加指标中每个值的平均值。

However, I would like to filter by code , only averaging those rows in metrics with a given code . 但是，我想按code进行过滤，仅使用给定code对metrics行进行平均。

> metrics
    cid code  DZ value1 value2
1: 1001    A 101      8     21
2: 1001    B 102     11     26
3: 1001    A 103     17     25
4: 1002    A 104     25     39
5: 1002    B 105      6     30
6: 1002    A 106     23     40
7: 1003    A 107     27     32
8: 1003    B 108     16     37
9: 1003    A 109     14     42

# DESIRED OUTPUT
> main
    cid  A.avg.val1   A.avg.val2    B.avg.val1      B.avg.val2    
1: 1001    12.5         23.0            11              26                      
2: 1002    24.0         39.5             6              30            
3: 1003    20.5         37.0            16              37            



#  SAMPLE DATA
set.seed(1)
main <- data.table(cid=1e3+1:3, key="cid")
metrics <- data.table(cid=rep(1e3+1:3, each=3), code=rep(c("A", "B", "A"), 3), DZ=101:109, value1=sample(30, 9), value2=sample(20:50, 9), key="cid")
code.filters <- c("A", "B")

These lines get the desired output, but I am having difficulty assigning the new col back into main. 这些行获得了所需的输出，但是我很难将新的col分配回main。 (also, doing it programatically would be preferred). （另外，以编程方式进行是首选）。

main[metrics[code==code.filters[[1]]]][,  list(mean(c(value1))), by=cid]
main[metrics[code==code.filters[[1]]]][,  list(mean(c(value2))), by=cid]
main[metrics[code==code.filters[[2]]]][,  list(mean(c(value1))), by=cid]
main[metrics[code==code.filters[[1]]]][,  list(mean(c(value2))), by=cid]

Additionally, can someone explain why the following line only takes the last value in each group? 此外，有人可以解释为什么以下行只取每组中的最后一个值？

main[metrics[ code=="A"],  A.avg.val1 := mean(c(value1))]

Answer 1

You don't need main . 你不需要main 。 You can get it directly from metrics as follows: 您可以直接从metrics获取它，如下所示：

> tmp.dt <- metrics[, list(A.avg.val1 = mean(value1[code=="A"]), 
                 A.avg.val2 = mean(value2[code=="A"]), 
                 B.avg.val1 = mean(value1[code == "B"]), 
                 B.avg.val2 = mean(value2[code == "B"])), by=cid]

#     cid A.avg.val1 A.avg.val2 B.avg.val1 B.avg.val2
# 1: 1001       12.5       23.0         11         26
# 2: 1002       24.0       39.5          6         30
# 3: 1003       20.5       37.0         16         37

If you still want to subset with main just do: 如果您仍想使用main进行子集，请执行以下操作：

main <- data.table(cid = c(1001:1002))
> tmp.dt[main]

#     cid A.avg.val1 A.avg.val2 B.avg.val1 B.avg.val2
# 1: 1001       12.5       23.0         11         26
# 2: 1002       24.0       39.5          6         30

Answer 2

I would do this in two steps. 我会分两步完成。 First, get your means, then reshape the data 首先，获取方法，然后reshape数据

foo <- main[metrics]
bar <- foo[, list(val1 = mean(value1), 
                  val2 = mean(value2)), 
           by=c('cid', 'code')]

library(reshape2)
bar.melt <- melt(bar, id.var=c('cid', 'code'))
dcast(data=bar.melt,
      cid ~ code + variable)

But really, I'd probably leave the data in the "long" format because I find it much easier to work with! 但实际上，我可能会将数据保留为“长”格式，因为我发现它更容易使用！

Answer 3

working off of @Arun's answer, the following gets the desired results: 根据@ Arun的回答，以下内容得到了预期的结果：

invisible( 
sapply(code.filters, function(cf)
    main[metrics[code==cf, list(avgv1 = mean(value1), avgv2 = mean(value2)), by=cid],
      paste0(cf, c(".avg.val1", ".avg.val2")) :=list(avgv1, avgv2)]
))

> main
    cid A.avg.val1 A.avg.val2 B.avg.val1 B.avg.val2
1: 1001       12.5       23.0         11         26
2: 1002       24.0       39.5          6         30
3: 1003       20.5       37.0         16         37

加入两个data.tables后创建新列

问题描述

3 个解决方案

解决方案1
3 2013-01-24 20:48:57

解决方案2
2 2013-01-24 20:49:27

解决方案3
2 已采纳 2013-01-24 21:04:27

加入两个data.tables后创建新列

问题描述

3 个解决方案

解决方案1 3 2013-01-24 20:48:57

解决方案2 2 2013-01-24 20:49:27

解决方案3 2 已采纳 2013-01-24 21:04:27

解决方案1
3 2013-01-24 20:48:57

解决方案2
2 2013-01-24 20:49:27

解决方案3
2 已采纳 2013-01-24 21:04:27