Haskell 中 Sundaram 的高效 (O(n^2)) 筛

Question

There are many answers on SO explaining how to implement the Sieve of Sundaram in Haskell but they're all... really inefficient? SO上有很多答案解释了如何在Haskell中实现Sundaram筛子，但它们都......真的效率低下吗？

All solutions that I have seen work like this:我见过的所有解决方案都是这样工作的：

Figure out the numbers <= n to exclude找出要排除的数字<= n
Filter these from [1..n]从[1..n]过滤这些
Modify the remaining numbers * 2 + 1修改剩余数字* 2 + 1

Here for example is my implementation that finds all primes between 1 and 2n+2 :例如，这是我的实现，它找到1和2n+2之间的所有素数：

sieveSundaram :: Integer -> [Integer]
sieveSundaram n = map (\x -> 2 * x + 1) $ filter (flip notElem toRemove) [1..n]
  where toRemove = [i + j + 2*i*j | i <- [1..n], j <- [i..n], i + j + 2*i*j <= n]

The problem I have with this, is that filter has to traverse the entire toRemove list for every element of [1..n] and thus this has complexity O(n^3) whereas a straightforward iterative implementation has complexity O(n^2).我遇到的问题是， filter必须为[1..n]的每个元素遍历整个toRemove列表，因此这具有复杂性 O(n^3) 而简单的迭代实现具有复杂性 O(n^2 ）。 How can I achieve that in Haskell?如何在 Haskell 中实现这一点？

Answer 1

As per the comments, base should not be considered a complete standard library for Haskell.根据评论，不应将base视为 Haskell 的完整标准库。 There are several key packages that every Haskell developer knows and uses and would consider part of Haskell's de facto standard library.每个 Haskell 开发人员都知道和使用几个关键包，并且会考虑作为 Haskell事实标准库的一部分。

By "straightforward iterative implementation", I assume you mean marking and sweeping an array of flags?通过“直接迭代实现”，我假设您的意思是标记和清除一组标志？ It would be usual to use a Vector or Array for this.通常为此使用Vector或Array 。 (Both would be considered "standard".) An O(n^2) Vector solution looks like the following. （两者都将被视为“标准”。） O(n^2) Vector解决方案如下所示。 Though it internally uses a mutable vector, the bulk update operator (//) hides this fact, so you can write it in a typical Haskell immutable and stateless style:尽管它在内部使用了可变向量，但批量更新运算符(//)隐藏了这一事实，因此您可以将其编写为典型的 Haskell 不可变和无状态样式：

import qualified Data.Vector as V

primesV :: Int -> [Int]
primesV n = V.toList                           -- the primes!
  . V.map (\x -> (x+1)*2+1)                    -- apply transformation
  . V.findIndices id                           -- get remaining indices
  . (V.// [(k - 1, False) | k <- removals n])  -- scratch removals
  $ V.replicate n True                         -- everyone's allowed

removals n = [i + j + 2*i*j | i <- [1..n], j <- [i..n], i + j + 2*i*j <= n]

Another possibility that's a little more straightforward is IntSet which is basically a set of integers with O(1) insertion/deletion and O(n) ordered traversal.另一种更直接的可能性是IntSet ，它基本上是一组具有O(1)插入/删除和O(n)有序遍历的整数。 (This is like the HashSet suggested in the comments, but specialized to integers.) This is in the containers packages, another "standard" package that's actually bundled with the GHC source, even though it's distinct from base . （这就像评论中建议的HashSet ，但专门用于整数。）这是在containers包中，另一个“标准” package 实际上与 GHC 源捆绑在一起，即使它与base不同。 It gives an O(n^2) solution that looks like:它给出了一个 O(n^2) 的解决方案，如下所示：

import qualified Data.IntSet as I

primesI :: Int -> [Int]
primesI n = I.toAscList               -- the primes!
  . I.map (\x -> x*2+1)               -- apply transformation
  $ I.fromList [1..n]                 -- integers 1..n ...
    I.\\ I.fromList (removals n)      -- ... except removals

Note that another important performance improvement is to use a better removals definition that avoids filtering all n^2 combinations.请注意，另一个重要的性能改进是使用更好的removals定义，避免过滤所有n^2组合。 I believe the following definition produces the same list of removals:我相信以下定义会产生相同的删除列表：

removals :: Int -> [Int]
removals n = [i + j + 2*i*j | j <- [1..(n-1) `div` 3], i <- [1..(n-j) `div` (1+2*j)]]

and does so in what I believe is O(n log(n)).我认为是 O(n log(n))。 If you use it with either primesV or primesI above, it's the bottleneck, so the resulting overall algorithm should be O(n log(n)), I think.如果你将它与上面的primesV或primesI一起使用，它就是瓶颈，所以我认为得到的整体算法应该是 O(n log(n))。

Answer 2

The question doesn't define what is meant by inefficient.这个问题没有定义低效的含义。 The OP seems to be hung up on using a Haskell Lazy List solution, and that will be inefficient from the outset, as operations on lists are sequential and have a high constant overhead in requiring a memory allocation per element containing many "plumbing" parts internally to implement possible laziness. OP 似乎挂断了使用 Haskell 惰性列表解决方案，这从一开始就效率低下，因为列表上的操作是顺序的，并且在需要每个包含许多内部“管道”部件的元素分配 memory 时具有很高的恒定开销实现可能的懒惰。

As I mentioned in the comments, the original definition of the Sieve of Sundaram is obscure and includes many redundant operations due to oversubscribing the ranges representing odd numbers;正如我在评论中提到的，Sundaram 筛子的原始定义是模糊的，并且由于过度订阅表示奇数的范围而包含许多冗余操作； it can be greatly simplified as described there.如那里所述，它可以大大简化。

However, even after minimizing the SoS's inefficiencies and if using List's is the way one wants to go: as the OP identifies, simple repeated filtering across the list isn't efficient because there will be many repeated operations per List element as per the following revised OP's code:然而，即使在最小化 SoS 的低效率之后，如果使用 List 是想要 go 的方式：正如 OP 所指出的那样，简单的重复过滤整个列表效率不高，因为根据以下修订的每个 List 元素将有许多重复操作OP的代码：

sieveSundaram :: Int -> [Int]
sieveSundaram n = map (\x -> 2 * x + 3) $ filter (flip notElem toRemove) [ 0 .. lmt ]
  where lmt = (n - 3) `div` 2
        sqrtlmt = (floor(sqrt(fromIntegral n)) - 3) `div` 2
        mkstrtibp i = ((i + i) * (i + 3) + 3, i + i + 3)
        toRemove = concat [ let (si, bp) = mkstrtibp i in [ si, si + bp .. lmt ]
                            | i <- [ 0 .. sqrtlmt ] ]

main :: IO ()
main = print $ sieveSundaram 1000

As there are O(n log n) values in the improved concatenated toRemove list and all of them must be scanned for all of the odd values to the sieving limit, the asymptotic complexity of this is O(n^2 log n) , which answers the question but isn't very good.由于改进的连接toRemove列表中有O(n log n)值，并且必须扫描所有这些值以查找所有奇数到筛分极限，因此渐近复杂度为O(n^2 log n) ，其中回答了这个问题，但不是很好。

The fastest List prime filtering technique is to lazily merge the tree of generated composite culling lists (instead of just concatenating it) and then generate an output list of all the odd numbers that aren't in the merged composites ( in increasing order so as to avoid scanning the whole list every time).最快的列表素数过滤技术是懒惰地合并生成的复合剔除列表的树（而不是仅仅连接它），然后生成一个 output 列表，其中包含不在合并复合中的所有奇数（按升序排列，以便避免每次都扫描整个列表）。 This isn't so efficient using a linear merge, but we can use a infinite tree-like merge that will only cost an extra factor of log n , which when multiplied by the O(n log n) complexity of the number of cull values of the correct Sieve of Sundaram gives a combined complexity of O(n log^2 n) , which is considerably less than the previous implementation.使用线性合并不是那么有效，但是我们可以使用无限树状合并，它只会花费log n的额外因子，当乘以剔除值数量的O(n log n)复杂度时正确的 Sundaram 筛给出了O(n log^2 n)的组合复杂度，这比以前的实现要小得多。

This merging works because each successive composite culling List starts from the square of the last odd number increased by two, so the first values of the culling sequence List's for each base value in the overall List of List's are already in increasing order;这种合并是有效的，因为每个连续的复合剔除 List 从最后一个奇数的平方加 2 开始，因此整个 List of List 中每个基值的剔除序列 List 的第一个值已经按递增顺序排列； thus, simple merge sorting of the List of List's doesn't race and is quite easy to implement:因此，列表列表的简单合并排序不会竞争并且很容易实现：

primesSoS :: () -> [Int]   
primesSoS() = 2 : sel 3 (_U $ map(\n -> [n * n, n * n + n + n..]) [ 3, 5.. ]) where
  sel k s@(c:cs) | k < c     = k : sel (k+2) s  -- ~= ([k, k + 2..] \\ s)
                 | otherwise =     sel (k+2) cs --      when null(s\\[k, k + 2..]) 
  _U ((x:xs):t) = x : (merge xs . _U . pairs) t -- tree-shaped folding big union
  pairs (xs:ys:t) = merge xs ys : pairs t
  merge xs@(x:xs') ys@(y:ys') | x < y     = x : merge xs' ys
                              | y < x     = y : merge xs  ys'
                              | otherwise = x : merge xs' ys'

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ takeWhile (<= cLIMIT) $ primesSoS()

Of course, one must ask the question "Why the Sieve of Sundaram?"当然，必须要问一个问题：“为什么是 Sundaram 的筛子？” as when the cruft of the original SoS formulation is removed ( see the Wikipedia article ), it becomes obvious that that only difference between the SoS and the Odds-Only Sieve of Eratosthenes is that the SoS doesn't filter the base culling odd numbers for only those that are prime as the Odds-Only SoE does.当去除原始 SoS 公式的残缺时（参见 Wikipedia 文章），很明显 SoS 和 Eratosthenes 的 Odds-Only Sieve 之间的唯一区别是 SoS 不会过滤基数剔除奇数只有那些像 Odds-Only SoE 那样的优质产品。 The following code does the recursive feed back of only the found base primes:以下代码仅对找到的基本素数进行递归反馈：

primesSoE :: () -> [Int]   
primesSoE() = 2 : _Y ((3:) . sel 5 . _U . map (\n -> [n * n, n * n + n + n..])) where
  _Y g = g (_Y g)  -- = g (g (g ( ... )))   non-sharing multistage fixpoint combinator
  sel k s@(c:cs) | k < c     = k : sel (k+2) s  -- ~= ([k, k + 2..] \\ s)
                 | otherwise =     sel (k+2) cs --      when null(s\\[k, k + 2..]) 
  _U ((x:xs):t) = x : (merge xs . _U . pairs) t -- tree-shaped folding big union
  pairs (xs:ys:t) = merge xs ys : pairs t
  merge xs@(x:xs') ys@(y:ys') | x < y     = x : merge xs' ys
                              | y < x     = y : merge xs  ys'
                              | otherwise = x : merge xs' ys'

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ takeWhile (<= cLIMIT) $ primesSoE()

The fixpoint _Y combinator takes care of the recursion and the rest is identical.固定点_Y组合器负责递归，rest 是相同的。 This version reduces the complexity by one log n factor so now the asymptotic complexity is O(n log n log log n) .此版本将复杂度降低了一个log n因子，因此现在渐近复杂度为O(n log n log log n) 。

If one really wants efficiency, one doesn't use List's but rather mutable arrays.如果一个人真的想要效率，一个人不使用列表，而是使用可变的 arrays。 The following code implements the SoS to a fixed range using a built-in bit-packed array:以下代码使用内置的位压缩数组将 SoS 实现到固定范围：

{-# LANGUAGE FlexibleContexts #-}

import Control.Monad.ST ( runST )
import Data.Array.Base ( newArray, unsafeWrite, unsafeFreezeSTUArray, assocs )

primesSoSTo :: Int -> [Int] -- generate a list of primes to given limit...
primesSoSTo limit
  | limit < 2 = []
  | otherwise = runST $ do
      let lmt = (limit - 3) `div` 2 -- limit index!
      oddcmpsts <- newArray (0, lmt) False -- indexed true is composite
      let getbpndx i = (i + i + 3, (i + i) * (i + 3) + 3) -- index -> bp, si0
          cullcmpst i = unsafeWrite oddcmpsts i True -- cull composite by index
          cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
      mapM_ cull4bpndx
            $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                        [ getbpndx i | i <- [ 0.. ] ] -- all odds!
      oddcmpstsf <- unsafeFreezeSTUArray oddcmpsts -- frozen in place!
      return $ 2 : [ i + i + 3 | (i, False) <- assocs oddcmpstsf ]

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ primesSoSTo cLIMIT

with asymptotic complexity of O(n log n) and the following code does the same for the Odds-Only SoE: O(n log n)的渐近复杂度和以下代码对 Odds-Only SoE 执行相同的操作：

{-# LANGUAGE FlexibleContexts #-}

import Control.Monad.ST ( runST )
import Data.Array.Base ( newArray, unsafeWrite, unsafeFreezeSTUArray, assocs )

primesSoETo :: Int -> [Int] -- generate a list of primes to given limit...
primesSoETo limit
  | limit < 2 = []
  | otherwise = runST $ do
      let lmt = (limit - 3) `div` 2 -- limit index!
      oddcmpsts <- newArray (0, lmt) False -- when indexed is true is composite
      oddcmpstsf <- unsafeFreezeSTUArray oddcmpsts -- frozen in place!
      let getbpndx i = (i + i + 3, (i + i) * (i + 3) + 3) -- index -> bp, si0
          cullcmpst i = unsafeWrite oddcmpsts i True -- cull composite by index
          cull4bpndx (bp, si0) = mapM_ cullcmpst [ si0, si0 + bp .. lmt ]
      mapM_ cull4bpndx
            $ takeWhile ((>=) lmt . snd) -- for bp's <= square root limit
                        [ getbpndx i | (i, False) <- assocs oddcmpstsf ]
      return $ 2 : [ i + i + 3 | (i, False) <- assocs oddcmpstsf ]

cLIMIT :: Int
cLIMIT = 1000

main :: IO ()
main = print $ primesSoETo cLIMIT

with asymptotic efficiency of O(n log log n) .渐近效率为O(n log log n) 。

Both of these last versions are perhaps a hundred times times faster than their List equivalents due to reduced constant factor execution time in mutable array operations rather than list operations as well as the reduction of a log n factor in the asymptotic complexity.由于可变数组操作而不是列表操作中的常数因子执行时间减少以及渐近复杂度中log n因子的减少，这两个最后一个版本可能比它们的 List 等效版本快一百倍。

Haskell 中 Sundaram 的高效 (O(n^2)) 筛

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-12-24 20:25:47

解决方案2
1 2021-06-08 11:35:46

Haskell 中 Sundaram 的高效 (O(n^2)) 筛

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-12-24 20:25:47

解决方案2 1 2021-06-08 11:35:46

解决方案1
2 已采纳 2020-12-24 20:25:47

解决方案2
1 2021-06-08 11:35:46