简体   繁体   中英

Generating finite lists of primes in Haskell

There are a lot topics on generating prime numbers in Haskell, but in my opinion, they all rely on ' isPrime ' function, which, if we don't know the primes sequence yet, should look like:

isPrime k = if k > 1 then null [ x | x <- [2,3..(div k 2) + 1], k `mod` x == 0]
                     else False

( div might be replaced with sqrt , but still...)

I've tried to construct prime numbers based on 'inductive definition' (assume we have a set of first n primes, then (n+1)th prime is the least integer such that none of the first n primes is a divisor of it). I've tried to do it in the Fibonacci sequence way, which is:

fib :: Int -> Int
fib 0 = 0
fib 1 = 1
fib n = fibs !! n
    where fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

And I ended up with this:

-- checking if second number is a divisor of first one
ifDoesn'tDivide :: Int -> Int -> Bool
ifDoesn'tDivide n k 
    | mod n k == 0 = False
    | otherwise    = True

-- generating list which consists of first n prime numbers
firstPrimes :: Int -> [Int]
-- firstPrimes 1  = [2]
firstPrimes n     = take n primes 
    where primes = 2:(tail primes) ++ 
         [head [x | x <- [3,4..], k <- primes, ifDoesn'tDivide x k == True]]

But it doesn't work, stack overflow when n >= 2 . Any advice on how to fix it?

"Haskell can define data structures in terms of themselves in effect creating infinite data structures" . Those prime numbers and Fibonacci sequences mentioned earlier are specific cases of defining data structures in terms of themselves, and Fibonacci sequence works just fine, but these primes doesn't.

Am I missing something, are those two algorithms different in a substantive way?

PS So, I think, I'm just looking for most 'Haskellish' way to do that.

You can always use a sieve which is rather elegant in Haskell.

primes = sieve [2..]

sieve (p : xs) = p : sieve [ x | x <- xs, x `mod` p > 0 ]

So to get the first 10 primes

> take 10 primes
[2,3,5,7,11,13,17,19,23,29]

Notice that while isPrime is not explicitly used the list comprehension ensures that every number on the list must be prime relative to all the primes preceding it, which is to say prime.

This is more efficient and it's at the heart of Eratosthenes' sieve (Edit).

The code above is the first example in:

The paper goes in to much more detail about the efficient implementation of sieves in Haskell and the role of laziness in the computation. Highly recommended!

The core of your solution attempt is the infinite list of primes given by:

primes = 2:(tail primes)
  ++ [head [x | x <- [3,4..]
              , k <- primes
              , ifDoesn'tDivide x k == True]]

Update: You mentioned in a comment that you were thinking of this algorithm imperatively, so you were imagining that Haskell would use a "current" value of tail primes that was still empty in order to evaluate something like [2] ++ [] ++ [3] and then loop. But, of course, Haskell isn't imperative and so doesn't work like this. In Haskell, primes has one, fixed definition that stays the same throughout the execution of the program. A Haskell program can "discover" (or more accurately "compute") the definition gradually, which allows us to define primes in terms of itself in the first place, but it can't change the definition over the course of execution.

So, in looking at this definition, you need to imagine that primes and consequently tail primes have the same value everywhere they appear, even when being used recursively. This is different than a typical recursive function that takes an argument:

fact 0 = 1
fact n = n * fact (n-1)

Here, even though the function fact has the same definition everywhere it appears, the value of fact n on the left-hand side and the value of fact (n-1) on the right-hand side can be different, thanks to the different argument.

Anyway, if we look at this primes definition with the idea that we need primes to be the infinite list of all primes everywhere it appears (and not with a value that changes or "grows" over time), then you can see why this definition won't work. Here, primes is defined as 2 : tail primes ++ [expr] for a complicated expr that does all the real work, but tail primes is supposed to be infinite, so in evaluating this expression you'll never even get to expr , because you'll never exhaust the list tail primes .

Even ignoring the ++ [expr] bit, because primes has a single fixed definition, an expression like:

primes = 2 : tail primes

isn't the right way to define an infinite list in terms of itself. The problem is that the second element of primes is defined to be the first element of tail primes , which is the second element of primes , so the second element of primes is defined as itself. That will create an infinite loop when Haskell tries to "discover"/"compute" its value. The key to the fibs definition:

fibs = 0 : 1 : zipWith (+) fibs (tail fibs)

is that the first and second elements are given, and then the third element is defined as the sum of the first two elements -- it isn't defined in terms of itself, but rather, it's defined in terms of earlier list elements. That's the key to a successful recursive definition of an infinite (or even finite) list.

Note that Haskell doesn't "know" that fib is an infinite list and doesn't do anything special for infinite lists. This would work the same way with recursively defined finite lists:

countdown = 10 : takeWhile (> 0) (map (subtract 1) countdown)

The key again is that each element in countdown is defined in such a way that it depends only on the previous element of countdown .

To modify your primes definition to work this way, what you probably want to do is generalize your list comprehension from getting the next prime after "2" to get the "next" prime after any current prime p , based on having primes available:

primeAfter p = head [x | x <- [p+1..], k <- primes, ifDoesn'tDivide x k]

This won't work for two reasons. First, because primes is infinite, this will keep checking divisibility by different k values forever. We need to modify it to check primes k only up to the current prime p :

primeAfter p = head [x | x <- [p+1..]
                       , k <- takeUntil (==p) primes
                       , ifDoesn'tDivide x k]

This uses a helper that takes the beginning of a list up until the first element where a predicate is true:

takeUntil p lst = case break p lst of (a,y:b) -> a ++ [y]

Second, the check is structured incorrectly. This list comprehension will allow through an x if there's any prime k that doesn't divide it. We need to let through an x only if all primes k don't divide it:

primeAfter p = head [x | x <- [p+1..]
                       , and [ifDoesn'tDivide x k
                               | k <- takeWhile (<=p) primes]]

then it has a chance of working, and we can define primes as:

primes = go 2
  where go p = p : go (primeAfter p)

Here, go adds the current prime to the list and then recurses to the next prime, using primeAfter . This works because even though primeAfter p accesses the infinite list primes being generated by the recursive go call, it only uses that list up to the current prime p , so it stops just before trying to access its own value in the list, only using primes generated before the call the primeAfter p .

So, this works, and I think is much in the spirit of your original attempt:

-- note this simplified version:
ifDoesn'tDivide :: Int -> Int -> Bool
ifDoesn'tDivide n k = mod n k /= 0

primes :: [Int]
primes = go 2
  where go p = p : go (primeAfter p)

primeAfter :: Int -> Int
primeAfter p = head [x | x <- [p+1..]
                       , and [ifDoesn'tDivide x k
                               | k <- takeUntil (==p) primes]]

takeUntil :: (a -> Bool) -> [a] -> [a]
takeUntil p lst = case break p lst of (a,y:b) -> a ++ [y]

main :: IO ()
main = do
  print $ take 10 primes

As @Mihalis has pointed out, primes is a pretty standard example in Haskell, so there are more elegant, one-line solutions to this, too.

TL;DR: no, the two algorithms are not substantially different.


Your definition, primes = 2:(tail primes) ++ .... says that head primes = 2 and head (tail primes) = head ((tail primes) ++ ....) = head (tail primes) . And that's of course problematic, causes infinite recursion.


The smallest fix to your code while preserving its intent is probably

firstPrimes1 :: Int -> [Int]
firstPrimes1 1  = [2]
firstPrimes1 n  =  ++ 
         take 1 [x | x <- [3,4..], 
                     and [ mod x k > 0 | k <- ]]

(this uses take 1 ... in place of your [head ...] ).

It is unbelievably slow ( looks exponential, or worse). But it should have been, of course,

firstPrimes2 1  = [2]
firstPrimes2 n  =  ++ 
         take 1 [x | x <- [3,4..], 
                     and [ mod x k > 0 | k <- ]]

which is now simply very slow, about cubic in time complexity. But it should have really been this, though:

firstPrimes2b 2  = [2]
firstPrimes2b n  = let { ps = firstPrimes2b (n-1) } in
       ps ++ 
         take 1 [x | x <- [..], 
                     and [ mod x k > 0 | k <- ps]]

which now behaves as if quadratic , and indeed is yet much faster than its predecessor in concrete terms as well.

To structure it like the Fibonacci stream, it could be written as

 = 2 : concatMap foo [1..]
  where
  foo k = let {  = take k  } in
          take 1 [ x | x <- [last ps+1..], 
                       and [ mod x k > 0 | k <- ]]
-- or 
 = 2 : concatMap bar (tail (inits ))
  where
  bar  = take 1 [ x | x <- [last ps+1..], 
                        and [ mod x k > 0 | k <- ]]
-- or even 
 = 2 : [p | (, ) <- zip (tail (inits )) 
                 , p <- take 1 [ x | x <- [+1..], 
                                     and [ mod x k > 0 | k <- ]]]

Indeed it looks like it follows an inductive pattern, specifically that of complete aka "strong" induction, forall(n).(forall( k < n ).P(k)) => P(n) .

So it is not fundamentally different from the Fibonacci calculation, although the latter refers only to the previous two elements whereas this one refers to all the previous elements while adding the new one. But just as the Fibonacci stream, this sequence too is defined ultimately in terms of itself: primes = ..... primes ...... .

The inits makes bar refer to the previously known primes ps explicitly while adding one more to them at each step ( expressed by take 1 ), just like you wanted. concatMap collects all the new one-element segments produced by each invocation of bar .

But why should that be only one prime? Couldn't we safely produce more than one new prime, from the k known previous primes? Must we really test the candidates by all the preceding primes, or can we use the well-known shortcut which you also mention in the question? Can we make it follow the pattern of complete prefix induction, forall(n).(forall( k < floor(sqrt(n)) ).P(k)) => P(n) , so that only O(log log n) expansion steps are needed to get to the n th prime?

Could we be producing longer segments on each step from each prefix of the primes sequence (which sequence always stays the same, of course), thus referring not to all the preceding primes for each candidate, but only to a much smaller portion of them?...


True sieve of Eratosthenes' most direct expression in Haskell is

import qualified Data.List.Ordered as O (minus)

primes = map head $ scanl (O.minus) [2..] [[p,p+p..] | p <- primes]

(With its obvious semantics, minus is easy to implement yourself, if not load from the data-ordlist package.)

Although Rev. S. Horsley, when he (re?-)introduced it in 1772, (*) described the sieve of Eratosthenes as the equivalent of

oprimes = map head $ 
       scanl (O.minus . tail) [3,5..] [[p*p,p*p+2*p..] | p <- oprimes]

primes2 = 2 : oprimes

primesUpTo n = 2 : map head a ++ takeWhile (<= n) b
   where
   (a,b:_) = span ((<= n) . (^2) . head) $
       scanl (O.minus . tail) [3,5..] [[p*p,p*p+2*p..] | p <- oprimes]

Running length $ primesUpTo n is immensely faster than length . takeWhile (<= n) primes length . takeWhile (<= n) primes . Can you see why?

Can you fix primes2 so it becomes as fast as primesUpTo , in accessing its n th element? It can follow your original thought, extending the known segment of primes, step by step, as alluded to in the previous section.

Also, do note that no isPrime function is used here at all. Which is the hallmark of the true sieve of Eratosthenes, which does not test for primality, it generates the composites, and gets the primes between the composites, for free.


How the first scanl code works: it starts with the sequence [2,3,4,5,...] . Then it makes a notice to remove [2,4,6,8,...] from it, and is left with the equivalent of [3,5,7,9,...] ie coprimes({2}) .

(This works, even though the lists are infinite, because Haskell has lazy evaluation -- only as much calculations are performed as demanded by the needs of performing the final output of the program.)

Then it makes a notice to remove from them the list [3,6,9,12,..] , and is left with coprimes({2,3}) .

At each stage it takes the head off the sequence-at-that-point-in-time , and puts that head element aside, thus forming the resulting sequence of primes.

(The same could be coded with iterate (or unfoldr , etc.). It's a nice exercise, can help clarify what's going on there exactly. When you'll do this, you'll see you'll be re-creating the primes sequence as part of the arguments to the step function being iterated (the current sequence of the first k primes' coprimes, and the next, k+1 -th prime, to remove its multiples from that sequence). The scanl versions refer to the original sequence of primes explicitly, taking the primes from it one after another, but it's the same thing.)

The second scanl variant only enumerates the prime's odd multiples, starting each enumeration from the prime's square (so, for eg 3 it's [9,15,21,27,...] , and for 7 it's [49,63,77,91,...] ). It still starts that enumeration for each prime though, not for each prime's square; that's why it has to make special arrangements to stop as soon as it's okay for it to stop, in the primesUpTo function. Which is the key to its efficiency .


(*) pg 314 of Philosophical Transactions, Vol.XIII.


see also: minus defined and used here , or here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM