简体   繁体   English

R 有效位数导致使用 eval 和解析文本的不等式的意外结果

[英]R the number of significant digits leads to unexpected results of inequality using eval and parse text

I am working on boolean rules related to terminal node assignment for CART-like trees related to my work ( http://web.ccs.miami.edu/~hishwaran/ishwaran.html )我正在研究与我的工作相关的 CART 类树的终端节点分配相关的 boolean 规则( http://web.ccs.miami.edu/~hishwaran/ishwaran.ZFC35FDC70D5FC69D2698Z3A82E

I have noticed problematic behavior in evaluating inequalities of character strings using eval and parse of text.我注意到在使用 eval 和 parse of text 评估字符串的不等式时存在问题。 The issue has to do with how R evaluates the internal representation of a number.这个问题与 R 如何评估数字的内部表示有关。

Here's an example involving the number pi.这是一个涉及数字 pi 的示例。 I want to check if a vector (which I call x) is less than or equal to pi.我想检查一个向量(我称之为 x)是否小于或等于 pi。

> pi > 圆周率
> [1] 3.141593 > [1] 3.141593
> rule = paste0("x <= ", pi) > 规则 = paste0("x <= ", pi)
> rule > 规则
> [1] "x <= 3.14159265358979" > [1] "x <= 3.14159265358979"

This rule checks whether the object x is less than pi where pi is represented to 14 digits.此规则检查 object x 是否小于 pi,其中 pi 表示为 14 位。 Now I will assign x to the values 1,2,3 and pi现在我将 x 分配给值 1,2,3 和 pi

> x = c(1,2,3,pi) > x = c(1,2,3,pi)

Here's what x is up to 15 digits这是 x 最多 15 位的数字

> print(x, digits=15) > 打印(x,数字=15)
> [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979 > [1] 1.00000000000000 2.00000000000000 3.00000000000000 3.14159265358979

Now let's evaluate this现在让我们评估一下

> eval(parse(text = rule)) > 评估(解析(文本 = 规则))
> [1] TRUE TRUE TRUE FALSE > [1] 对 对 对 对 错

Whooaaaaa, it looks like pi is not less than or equal to pi. Whooaaaaa,看起来 pi 不小于或等于 pi。 Right?正确的?

But now if I hard-code x to pi to 14 digits, it works:但是现在如果我将 x 硬编码为 pi 到 14 位,它可以工作:

> x = c(1,2,3,3.14159265358979) > x = c(1,2,3,3.14159265358979)
> eval(parse(text = rule)) [1] TRUE TRUE TRUE TRUE > 评估(解析(文本 = 规则))[1] 真 真 真 真

Obviously in the first case, the internal representation for pi has many digits and so when R evaluates the expression, it is greater than the float representation and it returns FALSE.显然,在第一种情况下,pi 的内部表示有很多数字,因此当 R 计算表达式时,它大于浮点表示并返回 FALSE。 In the second case it compares two floats, so the result is true.在第二种情况下,它比较两个浮点数,所以结果为真。

However, how to avoid this happening?但是,如何避免这种情况发生呢? I really need the first evaluation to come back true because I am automating this process for rule based inference and I cannot hard code a value (here this being pi) each time.我真的需要第一次评估才能恢复,因为我正在自动化这个过程以进行基于规则的推理,而且我不能每次都硬编码一个值(这里是 pi)。

One solution I use is to add a small tolerance value.我使用的一种解决方案是添加一个小的容差值。

> tol = sqrt(.Machine$double.eps) > tol = sqrt(.Machine$double.eps)
> rule = paste0("x <= ", pi + tol) > 规则 = paste0("x <= ", pi + tol)
> x = c(1,2,3,pi) > x = c(1,2,3,pi)
> eval(parse(text = rule)) > 评估(解析(文本 = 规则))
> [1] TRUE TRUE TRUE TRUE > [1] 对对对对对对对对对对对对对对对对对对对对对对对对

However, this seems like an ugly solution.然而,这似乎是一个丑陋的解决方案。

Any comments and suggestions are greatly appreciated!任何意见和建议都非常感谢!

You could just go via the pi name or via a function instead, to prevent pi from getting stringified (which is your first problem here)您可以通过 pi 名称或通过 function 来代替 go,以防止pi被字符串化(这是您的第一个问题)


rule  <-  "x <= pi"
x  <-  c(1,2,3,pi)

eval(parse(text = rule)) ## All TRUE

## another way might be to throw stuff you need uneval'ed into a function or a block:

my_pi <- function() {
    pi
}

rule  <-  "x <= my_pi()"
eval(parse(text = rule)) ## All TRUE


You still will suffer from the usual floating point issues, but imprecise stringification won't be your problem anymore.您仍然会遇到常见的浮点问题,但不精确的字符串化将不再是您的问题。

Here's why your approach didn't work:这就是您的方法不起作用的原因:


> print( pi, digits=20 )
[1] 3.141592653589793116
> print( eval(parse(text=pi)), digits=20 )
[1] 3.1415926535897900074

The stringified pi is less than R's pi by a good margin.字符串化的 pi 比 R 的 pi 小很多。

The paste manual says it uses as.character to convert numbers to strings. 粘贴手册说它使用as.character将数字转换为字符串。 Which in turn says it's using 15 significant digits which is what you are observing.这反过来又说它使用 15 位有效数字,这就是您所观察到的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM