[英]Merging data.tables by numeric column when machine tolerance needs to be accounted for
Many have seen the issue with using ==
to compare to floating point numbers.许多人已经看到使用
==
与浮点数进行比较的问题。 ==
fails to return TRUE
but all.equal
works. ==
无法返回TRUE
但all.equal
有效。
x <- sqrt(2)
x^2 == 2
#> [1] FALSE
all.equal(x^2, 2)
#> [1] TRUE
My issue comes from the need to join to data.table
s by a numeric column where ==
will fail to find the matching pairs.我的问题来自需要通过数字列加入
data.table
s,其中==
将无法找到匹配对。
I have considered coercing the numeric values to characters, but that option has too many other potiential errors.我考虑过将数值强制转换为字符,但该选项有太多其他潜在错误。 I have considered rounding the values, but that to, in the application I need, will create more problems.
我考虑过对值进行四舍五入,但是在我需要的应用程序中,这会产生更多问题。
Here is simple example of a join that is failing because DT1$x == DT2$x
will return FALSE
when it would be preferable to have the return be TRUE
.这是一个连接失败的简单示例,因为
DT1$x == DT2$x
将返回FALSE
,而最好返回TRUE
。
library(data.table)
packageVersion("data.table")
#> [1] '1.12.8'
DT1 <- data.table(x = sqrt(1:10), v1 = 1:10)
DT2 <- data.table(x = 1:10, v2 = LETTERS[1:10])
# set x to its square
DT1[, x := x^2]
# left join
merge(DT1, DT2, by = "x", all.x = TRUE)
#> x v1 v2
#> 1: 1 1 A
#> 2: 2 2 <NA>
#> 3: 3 3 <NA>
#> 4: 4 4 D
#> 5: 5 5 <NA>
#> 6: 6 6 <NA>
#> 7: 7 7 <NA>
#> 8: 8 8 <NA>
#> 9: 9 9 I
#> 10: 10 10 <NA>
How can I specify a left join by a numeric column key such that the machine tolerance in the comparison is accounted for?如何通过数字列键指定左连接,以便考虑比较中的机器公差? Created on 2020-04-06 by the reprex package (v0.3.0)
由代表 package (v0.3.0) 于 2020 年 4 月 6 日创建
You could use roll = "nearest"
.您可以使用
roll = "nearest"
。 Note that only the last column specified in on =
can be rolling.请注意,只有
on =
中指定的最后一列可以滚动。
library(data.table)
DT1[DT2,on = "x", roll = "nearest"]
x v1 v2
1: 1 1 A
2: 2 2 B
3: 3 3 C
4: 4 4 D
5: 5 5 E
6: 6 6 F
7: 7 7 G
8: 8 8 H
9: 9 9 I
10: 10 10 J
I suspect the problem is more complicated than this simple case, but you could subsequently filter joins that do not meet a certain threshold of difference.我怀疑这个问题比这个简单的情况更复杂,但是您可以随后过滤不满足特定差异阈值的连接。
Data数据
DT1 <- data.table(x = sqrt(1:10), v1 = 1:10)
DT2 <- data.table(x = 1:10, v2 = LETTERS[1:10])
DT1[, x := x^2]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.