如何通过唯一的组/子组对找到不等式的总和？

Question

Suppose I am working with the following data.table :假设我正在使用以下data.table ：

dta <- setDT(
  data.frame(
    id = c("A","A","A","B","B","C","C","C"),
    subid = c(1,1,2,1,2,1,1,1),
    x1 = c(1,1,3,1,2,3,3,3),
    x2 = c(3,3,1,1,1,3,3,3)
  )
)

> dta
   id subid x1 x2
1:  A     1  1  3
2:  A     1  1  3
3:  A     2  3  1
4:  B     1  1  1
5:  B     2  2  1
6:  C     1  3  3
7:  C     1  3  3
8:  C     1  3  3

For each unique id - subid pairing, I would like to find the total number of times that x1<x2 and the total number of times that x1>=x2 , and have those counts be added to the data.table as new columns/variables but aggregated to the id level.对于每个唯一的id - subid配对，我想找到x1<x2的总次数和x1>=x2的总次数，并将这些计数作为新列/变量添加到 data.table但聚合到 id 级别。

The outcome would look something like:结果将类似于：

   id subid x1 x2 lt gt
1:  A     1  1  3  1  1
2:  A     1  1  3  1  1
3:  A     2  3  1  1  1
4:  B     1  1  1  0  2
5:  B     2  2  1  0  2
6:  C     1  3  3  0  1
7:  C     1  3  3  0  1
8:  C     1  3  3  0  1

For example, of the two unique id-subid parings for id="A" , one has x1<x2 and one has x1>x2 , which means that for A the variable for "less-than" has a value of 1 (ie dta$lt[dta$id==A] <- 1 ), and the same for "greater-than" ( dta$gt[dta$id==A] <- 1 ).例如，在id="A"的两个唯一id-subid配对中，一个具有x1<x2 ，一个具有x1>x2 ，这意味着对于A ，“小于”变量的值为 1（即dta$lt[dta$id==A] <- 1 ），对于“大于”（ dta$gt[dta$id==A] <- 1 ）也是如此。

I have been searching for a solution to this but have not had much luck.我一直在寻找解决方案，但运气不佳。 I have found solutions to similar problems (eg counting number of unique observations by unique pairings), but have not been able to modify them to suit my needs.我发现类似的问题（通过独特的配对独特的观察例如计数值）的解决方案，但一直没能对它们进行修改，以满足我的需求。 In particular, I am struggling to aggregate the count from the id-subid level to the id level.特别是，我正在努力将计数从id-subid级别聚合到id级别。 (It could be that I'm not exactly sure how to search for -- or even word -- this question.) （可能是我不确定如何搜索 - 甚至单词 - 这个问题。）

I've been able to do this using nested loops on a data frame, but I suspect there is a more efficient way of doing it.我已经能够在数据框上使用嵌套循环来做到这一点，但我怀疑有一种更有效的方法来做到这一点。 In particular, I am curious about doing this using data.table .特别是，我对使用data.table这样做很好奇。

Answer 1

A possible solution:一个可能的解决方案：

dta[, c('lt', 'gt') := unique(.SD)[, .(sum(x1 < x2), sum(x1 >= x2))], by = .(id)]

which gives:这使：

 > dta id subid x1 x2 lt gt 1: A 1 1 3 1 1 2: A 1 1 3 1 1 3: A 2 3 1 1 1 4: B 1 1 1 0 2 5: B 2 2 1 0 2 6: C 1 3 3 0 1 7: C 1 3 3 0 1 8: C 1 3 3 0 1

如何通过唯一的组/子组对找到不等式的总和？

问题描述

1 个解决方案

解决方案1
3 已采纳 2020-03-06 21:10:17

如何通过唯一的组/子组对找到不等式的总和？

问题描述

1 个解决方案

解决方案1 3 已采纳 2020-03-06 21:10:17

解决方案1
3 已采纳 2020-03-06 21:10:17