Unstable results from Python factoring function

Question

def test_prime(n):
    q = True
    for p in range(2,n):  #Only need to check up to rootn for primes and n/2 for factors
        if int(n%p) is 0:         
            q = False
            print(p, 'and', int(n/p), 'are factors of ', n)    
    if q:
        print(n, 'IS a prime number!')
    else:
        print(n, 'IS NOT a prime number')

I've just started playing around with Python and I'm putting together some bits and pieces to pass the time. I've been playing about with testing for prime numbers and had the idea of showing the factors for non-primes. The function I've put together above seems to work well enough, except that is gives inconsistent outputs.

eg If I set n = 65432 I get...

2 and 32716 are factors of  65432
4 and 16358 are factors of  65432
8 and 8179 are factors of  65432
8179 and 8 are factors of  65432
16358 and 4 are factors of  65432
32716 and 2 are factors of  65432
65432 IS NOT a prime number

which it what I'd expect. But if I set n = 659306 I get...

2 and 329653 are factors of  659306
71 and 9286 are factors of  659306
142 and 4643 are factors of  659306
4643 and 142 are factors of  659306
9286 and 71 are factors of  659306
659306 IS NOT a prime number

which is different because it doesn't include the factor 329653 at the very end. This isn't a problem as all the factors are displayed somewhere but it is annoying me that I don't know WHY this happens for some numbers!

Just to show you that I'm not a complete moron, I have worked out that this seems only to happen with integer values over 5 chars in length. Can someone please tell me why the outputs are different in these two cases?

Answer 1

You want n % p == 0 , not n % p is 0 . is tests identity, not equality, and not every 0 is the same as every other 0.

>>> 659306 % 329653
0
>>> (659306 % 329653) == 0
True
>>> (659306 % 329653) is 0
False
>>> id(0)
136748976
>>> id(659306 % 329653) 
3070888160

The id there basically corresponds to a location in memory.

Think of it this way: if you have a loonie, and I have a loonie, then they're equal to each other in value (1 == 1), but they're not the same object (my one dollar coin is not the same as your one dollar coin.) We could share the same coin, but it's not necessary that we do.

[PS: You can use n//p for integer division instead of int(n/p) .]

Answer 2

What's happening behind the scenes is a little complicated. My comments apply specifically to CPython . Other implementations such as PyPy, Jython, IronPython, etc. will behave differently.

To decrease memory usage and improve performance, CPython caches a range of small integers and tries to return a reference to these objects instead of creating another integer object with the same value. When you compare numbers with is , you are actually checking if CPython returned a reference to the same cached object. But sometimes CPython doesn't check if an value is one of cached integers. How could this happen?

I'll explain CPython 3 since it is a little easier than CPython 2. The int type visible in CPython is actually called PyLong internally to the interpreter. PyLong stores an integer as an array of digits where each digit is between 0 and 2**15-1 (32-bit systems) or 0 and 2**30-1 (64-bit systems). The array grows in size as the numbers get larger; this allows effectively unlimited integers. When calculating % , CPython checks if the second argument is one digit long. If so, it call a C function (divrem1) that returns a digit as result. Next, PyLong_FromLong is called to convert a value that fits into a C long (ie the return value of divrem ) into a PyLong. PyLong_FromLong checks if the argument is in the range of cached integers and will return a reference to the cached integer if possible.

If the second argument is more than one digit long, a different C function (x_divrem) is called. x_divrem uses a general purpose arbitrary precision division algorithm to compute the remainder. Since x_divrem creates a PyLong to store the remainder during calculation, there is no advantage gained by avoiding the creation of another duplicate integer; it already exists. For calculations with random large numbers, the remainder will rarely be equal to one of the cached integers, so it isn't worth the time penalty to make the check.

There are other ways to create duplicate copies of the cached integers. I just analyzed the one from the question.

And this is why you don't use is for checking numeric equality.....

Unstable results from Python factoring function

Question

2 answers

solution1
10 2013-01-29 21:40:41

solution2
3 2013-01-30 03:44:07

Unstable results from Python factoring function

Question

2 answers

solution1 10 2013-01-29 21:40:41

solution2 3 2013-01-30 03:44:07

solution1
10 2013-01-29 21:40:41

solution2
3 2013-01-30 03:44:07