R 有效位数导致使用 eval 和解析文本的不等式的意外结果

[英]R the number of significant digits leads to unexpected results of inequality using eval and parse text

I am working on boolean rules related to terminal node assignment for CART-like trees related to my work ( http://web.ccs.miami.edu/~hishwaran/ishwaran.html )我正在研究与我的工作相关的 CART 类树的终端节点分配相关的 boolean 规则（ http://web.ccs.miami.edu/~hishwaran/ishwaran.ZFC35FDC70D5FC69D2698Z3A82E

I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text.我注意到在使用 eval 和 parse of text 评估字符串的不等式时存在问题。 The issue has to do with how R evaluates the internal representation of a number.这个问题与 R 如何评估数字的内部表示有关。

Here's an example involving the number pi.这是一个涉及数字 pi 的示例。 I want to check if a vector (which I call x) is less than or equal to pi.我想检查一个向量（我称之为 x）是否小于或等于 pi。

> pi > 圆周率
> [1] 3.141593 > [1] 3.141593
> rule = paste0("x <= ", pi) > 规则 = paste0("x <= ", pi)
> rule > 规则
> [1] "x <= 3.14159265358979" > [1] "x <= 3.14159265358979"

This rule checks whether the object x is less than pi where pi is represented to 14 digits.此规则检查 object x 是否小于 pi，其中 pi 表示为 14 位。 Now I will assign x to the values 1,2,3 and pi现在我将 x 分配给值 1,2,3 和 pi

> x = c(1,2,3,pi) > x = c(1,2,3,pi)

Here's what x is up to 15 digits这是 x 最多 15 位的数字

> print(x, digits=15) > 打印（x，数字=15）
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979 > [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979

Now let's evaluate this现在让我们评估一下

> eval(parse(text = rule)) > 评估（解析（文本 = 规则））
> [1] TRUE TRUE TRUE FALSE > [1] 对 对 对 对 错

Whooaaaaa, it looks like pi is not less than or equal to pi. Whooaaaaa，看起来 pi 不小于或等于 pi。 Right?正确的？

But now if I hard-code x to pi to 14 digits, it works:但是现在如果我将 x 硬编码为 pi 到 14 位，它可以工作：

> x = c(1,2,3,3.14159265358979) > x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule)) [1] TRUE TRUE TRUE TRUE > 评估（解析（文本 = 规则））[1] 真 真 真 真

Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE.显然，在第一种情况下，pi 的内部表示有很多数字，因此当 R 计算表达式时，它大于浮点表示并返回 FALSE。 In the second case it compares two floats, so the result is true.在第二种情况下，它比较两个浮点数，所以结果为真。

However, how to avoid this happening?但是，如何避免这种情况发生呢？ I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.我真的需要第一次评估才能恢复，因为我正在自动化这个过程以进行基于规则的推理，而且我不能每次都硬编码一个值（这里是 pi）。

One solution I use is to add a small tolerance value.我使用的一种解决方案是添加一个小的容差值。

> tol = sqrt(.Machine\$double.eps) > tol = sqrt(.Machine\$double.eps)
> rule = paste0("x <= ", pi + tol) > 规则 = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi) > x = c(1,2,3,pi)
> eval(parse(text = rule)) > 评估（解析（文本 = 规则））
> [1] TRUE TRUE TRUE TRUE > [1] 对对对对对对对对对对对对对对对对对对对对对对对对

However, this seems like an ugly solution.然而，这似乎是一个丑陋的解决方案。

Any comments and suggestions are greatly appreciated!任何意见和建议都非常感谢！

You could just go via the pi name or via a function instead, to prevent `pi` from getting stringified (which is your first problem here)您可以通过 pi 名称或通过 function 来代替 go，以防止`pi`被字符串化（这是您的第一个问题）

``````
rule  <-  "x <= pi"
x  <-  c(1,2,3,pi)

eval(parse(text = rule)) ## All TRUE

## another way might be to throw stuff you need uneval'ed into a function or a block:

my_pi <- function() {
pi
}

rule  <-  "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE

``````

You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.您仍然会遇到常见的浮点问题，但不精确的字符串化将不再是您的问题。

Here's why your approach didn't work:这就是您的方法不起作用的原因：

``````
> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074

``````

The stringified pi is less than R's pi by a good margin.字符串化的 pi 比 R 的 pi 小很多。

The paste manual says it uses as.character to convert numbers to strings. 粘贴手册说它使用as.character将数字转换为字符串。 Which in turn says it's using 15 significant digits which is what you are observing.这反过来又说它使用 15 位有效数字，这就是您所观察到的。