简体   繁体   English

如何通过唯一的组/子组对找到不等式的总和?

[英]How do I find the sum of inequalities by unique group/subgroup pairs?

Suppose I am working with the following data.table :假设我正在使用以下data.table

dta <- setDT(
  data.frame(
    id = c("A","A","A","B","B","C","C","C"),
    subid = c(1,1,2,1,2,1,1,1),
    x1 = c(1,1,3,1,2,3,3,3),
    x2 = c(3,3,1,1,1,3,3,3)
  )
)

> dta
   id subid x1 x2
1:  A     1  1  3
2:  A     1  1  3
3:  A     2  3  1
4:  B     1  1  1
5:  B     2  2  1
6:  C     1  3  3
7:  C     1  3  3
8:  C     1  3  3

For each unique id - subid pairing, I would like to find the total number of times that x1<x2 and the total number of times that x1>=x2 , and have those counts be added to the data.table as new columns/variables but aggregated to the id level.对于每个唯一的id - subid配对,我想找到x1<x2的总次数和x1>=x2的总次数,并将这些计数作为新列/变量添加到 data.table但聚合到 id 级别。

The outcome would look something like:结果将类似于:

   id subid x1 x2 lt gt
1:  A     1  1  3  1  1
2:  A     1  1  3  1  1
3:  A     2  3  1  1  1
4:  B     1  1  1  0  2
5:  B     2  2  1  0  2
6:  C     1  3  3  0  1
7:  C     1  3  3  0  1
8:  C     1  3  3  0  1

For example, of the two unique id-subid parings for id="A" , one has x1<x2 and one has x1>x2 , which means that for A the variable for "less-than" has a value of 1 (ie dta$lt[dta$id==A] <- 1 ), and the same for "greater-than" ( dta$gt[dta$id==A] <- 1 ).例如,在id="A"的两个唯一id-subid配对中,一个具有x1<x2 ,一个具有x1>x2 ,这意味着对于A ,“小于”变量的值为 1(即dta$lt[dta$id==A] <- 1 ),对于“大于”( dta$gt[dta$id==A] <- 1 )也是如此。

I have been searching for a solution to this but have not had much luck.我一直在寻找解决方案,但运气不佳。 I have found solutions to similar problems (eg counting number of unique observations by unique pairings), but have not been able to modify them to suit my needs.发现类似的问题(通过独特的配对独特的观察例如计数值)的解决方案,但一直没能对它们进行修改,以满足我的需求。 In particular, I am struggling to aggregate the count from the id-subid level to the id level.特别是,我正在努力将计数从id-subid级别聚合到id级别。 (It could be that I'm not exactly sure how to search for -- or even word -- this question.) (可能是我不确定如何搜索 - 甚至单词 - 这个问题。)

I've been able to do this using nested loops on a data frame, but I suspect there is a more efficient way of doing it.我已经能够在数据框上使用嵌套循环来做到这一点,但我怀疑有一种更有效的方法来做到这一点。 In particular, I am curious about doing this using .特别是,我对使用这样做很好奇。

A possible solution:一个可能的解决方案:

dta[, c('lt', 'gt') := unique(.SD)[, .(sum(x1 < x2), sum(x1 >= x2))], by = .(id)]

which gives:这使:

 > dta id subid x1 x2 lt gt 1: A 1 1 3 1 1 2: A 1 1 3 1 1 3: A 2 3 1 1 1 4: B 1 1 1 0 2 5: B 2 2 1 0 2 6: C 1 3 3 0 1 7: C 1 3 3 0 1 8: C 1 3 3 0 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM