简体   繁体   English

汉明数和双精度

[英]Hamming numbers and double precision

I was playing around with generating Hamming numbers in Haskell, trying to improve on the obvious (pardon the naming of the functions)我正在尝试在 Haskell 中生成汉明数,试图改进明显的(请原谅函数的命名)

mergeUniq :: Ord a => [a] -> [a] -> [a]
mergeUniq (x:xs) (y:ys) = case x `compare` y of
                               EQ -> x : mergeUniq xs ys
                               LT -> x : mergeUniq xs (y:ys)
                               GT -> y : mergeUniq (x:xs) ys

powers :: [Integer]
powers = 1 : expand 2 `mergeUniq` expand 3 `mergeUniq` expand 5
  where
    expand factor = (factor *) <$> powers

I noticed that I can avoid the (slower) arbitrary precision Integer if I represent the numbers as the triple of the 2-, 3- and 5-exponents like data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int } , where the number is understood to be 2 k2 * 3 k3 * 5 k5 .我发现我能避免(更慢)的任意精度的Integer ,如果我所代表的数字作为三重的2-,3-和5-指数像data Power = Power { k2 :: !Int, k3 :: !Int, k5 :: !Int } ,其中数字被理解为2 k2 * 3 k3 * 5 k5 The comparison of two Power s then becomes两个Power的比较则变为

instance Ord Power where
  p1 `compare` p2 = toComp (p1 `divP` gcdP) `compare` toComp (p2 `divP` gcdP)
    where
    divP p1 p2 = Power { k2 = k2 p1 - k2 p2, k3 = k3 p1 - k3 p2, k5 = k5 p1 - k5 p2 }
    gcdP = Power { k2 = min (k2 p1) (k2 p2), k3 = min (k3 p1) (k3 p2), k5 = min (k5 p1) (k5 p2) }
    toComp Power { .. } = fromIntegral k2 * log 2 + fromIntegral k3 * log 3 + fromIntegral k5 * log 5

So, very roughly speaking, to compare p₁ = 2 i₁ * 3 j₁ * 5 k₁ and p₂ = 2 i₂ * 3 j₂ * 5 k₂ we compare the logarithms of p₁ and p₂ , which presumably fit Double .所以,很粗略地讲,比较p₁ = 2 i₁ * 3 j₁ * 5 k₁p₂ = 2 i₂ * 3 j₂ * 5 k₂ ,我们比较的对数p₁p₂ ,这大概适合Double But actually we do even better: we first compute their GCD (by finding the min s of the corresponding exponents pairs — only Int arithmetic so far!), divide p₁ and p₂ by the GCD (by subtracting the min s from the corresponding exponents — also only Int arithmetic), and compare the logarithms of the results.但实际上我们做得更好:我们首先计算它们的 GCD(通过找到相应指数对的min p₁到目前为止只有Int算术!),将p₁p₂除以 GCD(通过从相应的指数中减去min p₂也只有Int算术),并比较结果的对数。

But, given that we go through Double s, there will be loss of precision eventually.但是,考虑到我们经过Double ,最终会失去精度。 And this is the ground for my questions:这是我的问题的基础:

  1. When will the finite precision of Double s bite me? Double的有限精度什么时候会咬我? That is, how to estimate the order of i, j, k for which the results of comparisons of 2 i * 3 j * 5 k with numbers with "similar" exponents will become unreliable?也就是说,如何估计2 i * 3 j * 5 k与具有“相似”指数的数字的比较结果会变得不可靠的i, j, k阶?
  2. How does the fact that we go through dividing by the GCD (which presumably lowers the exponents considerably for this task) modify the answer to the previous question?我们除以 GCD(这可能大大降低了此任务的指数)这一事实如何修改前一个问题的答案?

I did an experiment, comparing the numbers produced this way with the numbers produced via going through arbitrary precision arithmetic, and all Hamming numbers up to the 1'000'000'000th match exactly (which took me about 15 minutes and 600 megs of RAM to verify).我做了一个实验,将这种方式产生的数字与通过任意精度算术产生的数字进行比较,并且所有汉明数直到 1'000'000'000th 完全匹配(这花了我大约 15 分钟和 600 兆的 RAM验证)。 But that's obviously not a proof.但这显然不是证据。

Empirically , it's above about 10 trillionths Hamming number, or higher.从经验上看,它高于大约 10 万亿分之一的汉明数,或者更高。

Using your nice GCD trick won't help us here, because some neighboring Hamming numbers are bound to have no common factors between them.使用你的 GCD 技巧在这里对我们没有帮助,因为一些相邻的汉明数之间必然没有公因数。

update: trying it online on ideone and elsewhere, we get更新:在 ideone和其他地方在线尝试,我们得到

4T  5.81s 22.2MB  -- 16 digits used.... still good
                  --  (as evidenced by the `True` below), but really pushing it.
((True,44531.6794,7.275957614183426e-11),(16348,16503,873),"2.3509E+13405")
-- isTruly  max        min logval           nth-Hamming       approx.
--  Sorted   logval      difference          as i,j,k          value
--            in band      in band                             in decimal
10T   11.13s 26.4MB
((True,60439.6639,7.275957614183426e-11),(18187,23771,1971),"1.4182E+18194")
13T   14.44s 30.4MB    ...still good
((True,65963.6432,5.820766091346741e-11),(28648,21308,1526),"1.0845E+19857")

---- same code on tio:
10T   16.77s
35T   38.84s 
((True,91766.4800,5.820766091346741e-11),(13824,2133,32112),"2.9045E+27624")
70T   59.57s
((True,115619.1575,5.820766091346741e-11),(13125,13687,34799),"6.8310E+34804")

---- on home machine:
100T: 368.13s
((True,130216.1408,5.820766091346741e-11),(88324,876,17444),"9.2111E+39198")

140T: 466.69s
((True,145671.6480,5.820766091346741e-11),(9918,24002,42082),"3.4322E+43851")

170T: 383.26s         ---FAULTY---
((False,155411.2501,0.0),(77201,27980,14584),"2.80508E+46783")

I guess that you could use adaptive arbitrary precision to compute the log.我猜您可以使用自适应任意精度来计算日志。

If you choose log base 2, then log2(2^i) is trivial.如果您选择 log base 2,则log2(2^i)是微不足道的。 That eliminates 1 factor and log2 has the advantage of being easier to compute than natural logarithm ( https://en.wikipedia.org/wiki/Binary_logarithm gives an algorithm for example, there is also Shanks...).这消除了 1 个因子,log2 的优点是比自然对数更容易计算( https://en.wikipedia.org/wiki/Binary_logarithm给出了一个算法,例如还有 Shanks ......)。

For log2(3) and log2(5), you would develop just enough terms to distinguish both operands.对于 log2(3) 和 log2(5),您将开发足够的术语来区分两个操作数。 I don't know if it would lead to more operations than directly exponentiating 3^j and 5^k in large integer arithmetic and counting high bit... But those could be pre-tabulated up to required number of digits.我不知道它是否会导致比在大整数算法中直接对 3^j 和 5^k 取幂并计算高位更多的运算......但是这些可以预先制表到所需的位数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM