[英]Different data.table r results with or without quotations / How to count unique values with uniqueN
I'm trying to select all claims (ICN) with the value of 7015061009422. I use the following code: 我正在尝试选择值为7015061009422的所有声明(ICN)。我使用以下代码:
dt[ICN==7015061009422]
And I get back 然后我回来了
> dt[ICN==7015061009422]
UniversalID ICN
1: 2 7015061009422
2: 3 7015061009417
3: 2 7015061009411
4: 2 7015061009428
5: 2 7015061009437
6: 4 7015061009417
7: 5 7015061009411
8: 6 7015061009417
9: 6 7015061009422
10: 7 7015061009422
I finally figured out that if I put quotations around the value, what I want to happen, which is to select all the lines that actually have the ICN number equal to 7015061009422, that is what I get: 我终于想通了如果我在值周围加上引号,我想要发生什么,即选择所有实际上ICN数等于7015061009422的行,这就是我得到的:
> dt[ICN=="7015061009422"]
UniversalID ICN
1: 2 7015061009422
2: 6 7015061009422
3: 7 7015061009422
Why is it that using quotations around my data values makes such a big difference? 为什么在我的数据值周围使用引号会产生如此大的差异?
Sample dataset: 样本数据集:
ICN<-c(7015061009422,7015061009417,7015061009411,7015061009428,7015061009437,7,7015061009417,7015061009411,7015061009417,7015061009422,7015061009422,1)
UniversalID<-c(2,3,2,2,2,1,4,5,6,6,7,8)
dt<-cbind(UniversalID,ICN)
dt<-as.data.table(dt)
dt[ICN==7015061009422]
dt[ICN=="7015061009422"]
I'm hoping that this can also help me figure out why my unique count isn't working: 我希望这也可以帮助我弄清楚为什么我的独特计数不起作用:
> > dt[,uniqueN(ICN)]
[1] 3
Very clearly, there are more than three different ICN values, so why this happening? 很明显,有三个以上不同的ICN值,为什么会发生这种情况呢?
When you compare very large numbers using ==
, R does not check for exact equality. 当您使用
==
比较非常大的数字时,R不检查确切的相等性。 Rather, it checks whether the numbers are "close" to each other. 相反,它检查数字是否彼此“接近”。 If so, it returns TRUE.
如果是,则返回TRUE。 R does the same thing with floating point numbers.
R对浮点数做了同样的事情。
When you put quotations around your number, you are implicitly casting it into a string. 当您在数字周围加上引号时,您将隐式地将其转换为字符串。 Thus, calling
==
will cause R to do a string comparison, and you get the answer you want. 因此,调用
==
将导致R进行字符串比较,并得到您想要的答案。
Instead of a == b
, use abs(ab) < 0.001
, or some other arbitrary small number. 而不是
a == b
,使用abs(ab) < 0.001
,或其他任意小数。
EDIT: Are you doing any actual arithmetic with these numbers? 编辑:你用这些数字做任何实际算术吗? If not, you should probably just convert them to factors.
如果没有,您应该将它们转换为因子。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.