简体   繁体   English

使用 data.table 在关系计数中不等于

[英]Not Equal To in Relational Count using data.table

I am hoping to understand how to use data.table to calculate a count given all levels of a particular categorical variable that do not match the value for the record.我希望了解如何使用 data.table 计算给定与记录值不匹配的特定分类变量的所有级别的计数。

Take the follow data.table.以下面的数据表为例。

df <- data.table(var1 = c('dog','cat','dog','cat','dog','dog','dog'),
             var2 = c(1,5,90,95,91,110,8),
             var3 = c('lamp','lamp','lamp','table','table','table','table'))

I would like to calculate the count of values that fall within a range and do not share the same value for var1 .我想计算落在一个范围内并且不共享相同值var1的值的计数。

This is related to Count of values within specified range of value in each row using data.table .这与使用 data.table 的每行中指定值范围内的值计数有关。 To quote the answer from @Jaap, the following code allows for producing a count within a range.引用@Jaap 的答案,以下代码允许在范围内生成计数。

df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5)
                       , on = .(var2 >= var2min, var2 <= var2plus)
                       , .N
                       , by = .EACHI][, N]][]

In attempting to expand this answer, I had success in requiring an exact match for var1 with the following:在尝试扩展此答案时,我成功地要求var1与以下内容完全匹配:

df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5, var1 = var1)
                       , on = .(var2 >= var2min, var2 <= var2plus, var1 = var1)
                       , .N
                       , by = .EACHI][, N]][]

The code below is my attempt at counting when var1 holds a value that is not equal to the var1 value in the given row, but this code fails.下面的代码是我尝试在var1保存的值不等于给定行中的var1值时进行计数,但此代码失败。

df[, var2withinrange := df[.(var2min = var2 - 5, var2plus = var2 + 5, var1 = var1)
                       , on = .(var2 >= var2min, var2 <= var2plus, var1 != var1)
                       , .N
                       , by = .EACHI][, N]][]

How can a "not equal to" type operator be added?如何添加“不等于”类型运算符? A data.table answer is preferable, but of course a solution in dplyr or really any alternative would be appreciated! data.table答案是可取的,但当然dplyr的解决方案或任何替代方案将不胜感激!

In this particular case, you can do the following:在这种特殊情况下,您可以执行以下操作:

df[.(var2min = var2 - 5, var2plus = var2 + 5, v1=var1)
    , on = .(var2 >= var2min, var2 <= var2plus)
    , sum(v1 != x.var1)
    , by = .EACHI]

output:输出:

   var2 var2 V1
1:   -4    6  1
2:    0   10  2
3:   85   95  1
4:   90  100  2
5:   86   96  1
6:  105  115  0
7:    3   13  1

In general, I think you can do an anti-join.一般来说,我认为你可以做一个反加入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM