简体   繁体   English

不同的data.table r结果有或没有引用/如何使用uniqueN计算唯一值

[英]Different data.table r results with or without quotations / How to count unique values with uniqueN

I'm trying to select all claims (ICN) with the value of 7015061009422. I use the following code: 我正在尝试选择值为7015061009422的所有声明(ICN)。我使用以下代码:

  dt[ICN==7015061009422]

And I get back 然后我回来了

> dt[ICN==7015061009422]
    UniversalID           ICN
 1:           2 7015061009422
 2:           3 7015061009417
 3:           2 7015061009411
 4:           2 7015061009428
 5:           2 7015061009437
 6:           4 7015061009417
 7:           5 7015061009411
 8:           6 7015061009417
 9:           6 7015061009422
10:           7 7015061009422

I finally figured out that if I put quotations around the value, what I want to happen, which is to select all the lines that actually have the ICN number equal to 7015061009422, that is what I get: 我终于想通了如果我在值周围加上引号,我想要发生什么,即选择所有实际上ICN数等于7015061009422的行,这就是我得到的:

    > dt[ICN=="7015061009422"]
   UniversalID           ICN
1:           2 7015061009422
2:           6 7015061009422
3:           7 7015061009422

Why is it that using quotations around my data values makes such a big difference? 为什么在我的数据值周围使用引号会产生如此大的差异?

Sample dataset: 样本数据集:

    ICN<-c(7015061009422,7015061009417,7015061009411,7015061009428,7015061009437,7,7015061009417,7015061009411,7015061009417,7015061009422,7015061009422,1)
  UniversalID<-c(2,3,2,2,2,1,4,5,6,6,7,8)
  dt<-cbind(UniversalID,ICN)
  dt<-as.data.table(dt)
  dt[ICN==7015061009422]
  dt[ICN=="7015061009422"]

I'm hoping that this can also help me figure out why my unique count isn't working: 我希望这也可以帮助我弄清楚为什么我的独特计数不起作用:

> > dt[,uniqueN(ICN)] 
[1] 3

Very clearly, there are more than three different ICN values, so why this happening? 很明显,有三个以上不同的ICN值,为什么会发生这种情况呢?

When you compare very large numbers using == , R does not check for exact equality. 当您使用==比较非常大的数字时,R不检查确切的相等性。 Rather, it checks whether the numbers are "close" to each other. 相反,它检查数字是否彼此“接近”。 If so, it returns TRUE. 如果是,则返回TRUE。 R does the same thing with floating point numbers. R对浮点数做了同样的事情。

When you put quotations around your number, you are implicitly casting it into a string. 当您在数字周围加上引号时,您将隐式地将其转换为字符串。 Thus, calling == will cause R to do a string comparison, and you get the answer you want. 因此,调用==将导致R进行字符串比较,并得到您想要的答案。

Instead of a == b , use abs(ab) < 0.001 , or some other arbitrary small number. 而不是a == b ,使用abs(ab) < 0.001 ,或其他任意小数。

EDIT: Are you doing any actual arithmetic with these numbers? 编辑:你用这些数字做任何实际算术吗? If not, you should probably just convert them to factors. 如果没有,您应该将它们转换为因子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM