[英]How do I find the sum of inequalities by unique group/subgroup pairs?
Suppose I am working with the following data.table
:假设我正在使用以下
data.table
:
dta <- setDT(
data.frame(
id = c("A","A","A","B","B","C","C","C"),
subid = c(1,1,2,1,2,1,1,1),
x1 = c(1,1,3,1,2,3,3,3),
x2 = c(3,3,1,1,1,3,3,3)
)
)
> dta
id subid x1 x2
1: A 1 1 3
2: A 1 1 3
3: A 2 3 1
4: B 1 1 1
5: B 2 2 1
6: C 1 3 3
7: C 1 3 3
8: C 1 3 3
For each unique id
- subid
pairing, I would like to find the total number of times that x1<x2
and the total number of times that x1>=x2
, and have those counts be added to the data.table as new columns/variables but aggregated to the id level.对于每个唯一的
id
- subid
配对,我想找到x1<x2
的总次数和x1>=x2
的总次数,并将这些计数作为新列/变量添加到 data.table但聚合到 id 级别。
The outcome would look something like:结果将类似于:
id subid x1 x2 lt gt
1: A 1 1 3 1 1
2: A 1 1 3 1 1
3: A 2 3 1 1 1
4: B 1 1 1 0 2
5: B 2 2 1 0 2
6: C 1 3 3 0 1
7: C 1 3 3 0 1
8: C 1 3 3 0 1
For example, of the two unique id-subid
parings for id="A"
, one has x1<x2
and one has x1>x2
, which means that for A
the variable for "less-than" has a value of 1 (ie dta$lt[dta$id==A] <- 1
), and the same for "greater-than" ( dta$gt[dta$id==A] <- 1
).例如,在
id="A"
的两个唯一id-subid
配对中,一个具有x1<x2
,一个具有x1>x2
,这意味着对于A
,“小于”变量的值为 1(即dta$lt[dta$id==A] <- 1
),对于“大于”( dta$gt[dta$id==A] <- 1
)也是如此。
I have been searching for a solution to this but have not had much luck.我一直在寻找解决方案,但运气不佳。 I have found solutions to similar problems (eg counting number of unique observations by unique pairings), but have not been able to modify them to suit my needs.
我发现类似的问题(通过独特的配对独特的观察例如计数值)的解决方案,但一直没能对它们进行修改,以满足我的需求。 In particular, I am struggling to aggregate the count from the
id-subid
level to the id
level.特别是,我正在努力将计数从
id-subid
级别聚合到id
级别。 (It could be that I'm not exactly sure how to search for -- or even word -- this question.) (可能是我不确定如何搜索 - 甚至单词 - 这个问题。)
I've been able to do this using nested loops on a data frame, but I suspect there is a more efficient way of doing it.我已经能够在数据框上使用嵌套循环来做到这一点,但我怀疑有一种更有效的方法来做到这一点。 In particular, I am curious about doing this using data.table .
特别是,我对使用data.table这样做很好奇。
A possible solution:一个可能的解决方案:
dta[, c('lt', 'gt') := unique(.SD)[, .(sum(x1 < x2), sum(x1 >= x2))], by = .(id)]
which gives:这使:
> dta id subid x1 x2 lt gt 1: A 1 1 3 1 1 2: A 1 1 3 1 1 3: A 2 3 1 1 1 4: B 1 1 1 0 2 5: B 2 2 1 0 2 6: C 1 3 3 0 1 7: C 1 3 3 0 1 8: C 1 3 3 0 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.