简体   繁体   English

如何解决在Perl的Statistics :: Descriptive中导致无限循环的舍入错误?

[英]How can I work around a round-off error that causes an infinite loop in Perl's Statistics::Descriptive?

I'm using the Statistics::Descriptive library in Perl to calculate frequency distributions and coming up against a floating point rounding error problem. 我使用Perl中的Statistics :: Descriptive库来计算频率分布并遇到浮点舍入误差问题。

I pass in two values, 0.205 and 0.205, (taken from other numbers and sprintf'd to those) to the stats module and ask it to calculate the frequency distribution but it's getting stuck in an infinite loop. 我将两个值0.205和0.205(从其他数字中提取,并通过sprintf'd传递给stats模块),要求其计算频率分布,但陷入了无限循环。

Stepping through with a debugger I can see that it's doing: 逐步调试器,我可以看到它正在做:

my $interval = $self->{sample_range}/$partitions;

my $iter = $self->{min};

while (($iter += $interval) <  $self->{max}) {

  $bins{$iter} = 0;

  push @k, $iter;  ##Keep the "keys" unstringified

}

$self->sample_range (The range is max-min)is returning 2.77555756156289e-17 rather than 0 as I'd expect. $ self-> sample_range(范围为最大-最小)返回2.77555756156289e-17而不是我期望的0。 This means that the loop ((min+=range) < max)) enters a (for all intents and purposes) infinite loop. 这意味着该循环((min + = range)<max))进入(出于所有目的和目的)无限循环。

DB<8> print $self->{max}; DB <8>打印$ self-> {max};
0.205 0.205
DB<9> print $self->{min}; DB <9>打印$ self-> {min};
0.205 0.205
DB<10> print $self->{max}-$self->{min}; DB <10>打印$ self-> {max}-$ self-> {min};
2.77555756156289e-17 2.77555756156289e-17

So this looks like a rounding problem. 因此,这似乎是一个舍入问题。 I can't think how to fix this on my side though, and I'm not sure editing the library is a good idea. 不过,我不认为该如何解决此问题,并且不确定编辑库是否是一个好主意。 I'm looking for suggestions of a workaround or alternative. 我正在寻找解决方法或替代方法的建议。

Cheers, Neil 干杯,尼尔

I am the Statistics::Descriptive maintainer. 我是Statistics ::描述性维护者。 Due to its numeric nature, many rounding problems have been reported. 由于其数值性质,已经报道了许多舍入问题。 I believe this particular one was fixed in a later version to the one you were using that I released recently, by using multiplication for the divisions instead of +=. 我相信这一特定版本已在您最新使用的版本中得到了更新,该版本使用乘除而不是+ =。

Please use the most up-to-date version from the CPAN, and it should be better. 请使用CPAN 的最新版本 ,它应该更好。

Not exactly a rounding problem; 不完全是一个四舍五入的问题; you can see the more precise values with something like 您可以通过以下方式查看更精确的值

printf("%.18g %.18g", $self->{max}, $self->{min});

Looks to me like there's a flaw in the module where it assumes the sample range can be divided up into $partitions pieces; 在我看来,该模块中存在一个缺陷,即它假定样本范围可以分为$ partitions个部分; because floating point doesn't have infinite precision, this isn't always possible. 因为浮点数没有无限的精度,所以这并不总是可能的。 In your case, the min and max values are exactly adjacent representable values, so there can't be more than one partition. 在您的情况下,最小值和最大值恰好是可表示的相邻值,因此分区不能超过一个。 I don't know what exactly the module is using the partitions for, so I'm not sure what the impact of this may be. 我不知道模块到底将分区用于什么目的,所以我不确定这样做可能会产生什么影响。 Another possible problem in the module is that it is using numbers as hash keys, which implicitly stringifies them which slightly rounds the value. 模块中的另一个可能的问题是,它使用数字作为哈希键,这会隐式地将其字符串化,从而将值略微舍入。

You may have some success in laundering your data through stringization before feeding it to the module: 在将数据馈送到模块之前,您可以通过串洗在数据上取得一些成功:

$data = 0+"$data";

This will at least ensure that two numbers that (with the default printing precision) appear equal are actually equal. 这将至少确保两个数字(具有默认打印精度)看起来相等,实际上是相等的。

That shouldn't cause an infinite loop. 那不应该引起无限循环。 What would cause that loop to be infinite would be if $self->{sample_range}/$partitions is 0. 如果$self->{sample_range}/$partitions为0,将导致循环无限长。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM