简体   繁体   中英

Help with debugging unexpected takeWhile behaviour with large numbers in Haskell

Firstly, apologies for the vague title, but I'm not sure exactly what I'm asking here(!).

After encountering Haskell at university, I've recently started using it in anger and so am working through the Project Euler problems as an extended Hello World, really. I've encountered a bug in one of my answers that seems to suggest a misunderstanding of a fundamental part of the language, and it's not something I could work out from the tutorials, nor something I know enough about to start Googling for.

A brief description of the issue itself - the solution relates to primes, so I wanted an infinite list of prime numbers which I implemented (without optimisation yet!) thusly:

isPrime :: Int -> Bool
isPrime n = isPrime' 2 n
            where isPrime' p n | p >= n           = True
                               | n `mod` p == 0  = False
                               | otherwise       = isPrime' (p+1) n

primes :: [Int]
primes = filter isPrime [2..]

Since infinite lists can be a little tedious to evaluate, I'll of course be using lazy evaluation to ensure that just the bits I want get evaulatued. So, for example, I can ask GHCI for the prime numbers less than 100:

*Main> takeWhile (< 100) primes
[2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71,73,79,83,89,97]

Now here's the part that I don't understand at all - when the upper limit gets large enough, I get no answers back at all . In particular:

*Main> takeWhile (< 4000000000) primes
[]

This isn't a problem with takeWhile itself, or the partially-applied function, as takeWhile (< 4000000000) [2..] works as I would expect. It's not a problem with my use of filter (within the definition of primes ), since takeWhile (< 4000000000) (filter even [2..]) also returns the expected result.

Through binary search I found that the greatest upper limit that works is 2^31 - 1 , so this would certainly seem to imply some kind of space-based constraint (ie largest positive signed integer). However:

  1. I was of the impression that Haskell had no language limits on the size of integers, and they were bounded only by the amount of free memory.
  2. This number only appears in the less-than predicate, which I know works as expected in at least some cases. Surely when it's applied to elements of a list, it shouldn't care where they come from? Looking solely at the first element, I know that the predicate returns true for the input 2 when it comes from filter even [2..] ; I know that primes returns 2 as its first element. So how can my list be empty, how is this predicate failing "for some values of 2"?

Any thoughts would be grateful as I don't have enough experience to know where to start with this one. Thanks for taking the time to take a look.

There are 2 built-in integral types in haskell: Int and Integer . Integer is the default and is unbounded. Int however is bounded. Since you're explicitly using Int in the type for isPrime 4000000000 is used as an Int and overflows. If you change the type of isPrime to Integer -> Bool or even better Integral a => a -> Bool (read: a function that can take any kind of Integral value and returns a Bool ), it will work as expected.

The important thing to take away here (other than the difference between Int and Integer ) is that the type of 4000000000 depends on how it is used. If it is used as an argument to a function that takes an Int , it will be an Int (and on 32-bit systems it will overflow). If it is used as an argument to a function that takes an Integer , it will be an Integer (and never overflow). If it is used as an argument to a function that takes any kind of Integral , it will also be an Integer because Integer is the default instance of Integral .

That's an easy answer (...which I see has already been partly answered) - "premature specialization".

The first part of your definition, the type signature, specifies:

isPrime :: Int -> Bool

An Int is not just a "shortcut" way to say Integer - they are different types ! To be a nit-picker (which in turn invites every one else to tear apart the many places here, where I am not accurate), there are never "different values of 2 " - it has to be of type Int, because that's how you specified the function (you compare 2 to the function's argument n and you're only allowed to compare values of the same type, so your 2 is "pinned down" to the Int type.

Oh, and just as a warning, the Int type is a type just rife with corner case potential. If your system is built in a 64-bit environment, then your Int will also be based on a 64-bit representation, and your example will work up to 2^63-1, instead of 2^31-1 as yours did. Note my phrasing: I have a 64-bit computer with an MS Windows OS, which means that there is not yet an official 64-bit MinGW toolchain - my OS is 64-bit, but the GHC version I have was compiled with 32-bit libraries, so it has 32-bit-based Int s. When I use Linux, even in a VM, it has a 64-bit toolchain, so Int s are 64 bits. If you had used one of those, you may not have even noticed the behavior!

So, I guess that's just one more reason to be careful when reasoning about your types. (Especially in Haskell, anyway....)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM