简体   繁体   中英

Possible pandas bug?

Just looking at some strange behavior in Python/Pandas.

I know the setup is convoluted, I was doing some... challenges.

def lucas_n(n):
    '''Return the fist n lucas numbers modulo 1_000_007'''
    my_list = [1,3]
    while len(my_list) < n:
        my_list.append((my_list[-1]+my_list[-2])%1_000_007)
    return my_list

def f(seq):
    '''Look up https://projecteuler.net/problem=739'''
    
    df = pd.Series(seq)
    
    for i in range(len(seq)-1):
        df = df.iloc[1:].cumsum()
        
    return df.iloc[0]

x = lucas_n(1e4)

f(x)

>>> -8402283173942682253

In short, x is a sequence of positive integers, and f applies consecutive .iloc[1:].cumsum() operations.

And the output is negative...

Is this a bug? A data type issue?

It appears that you have an integer overflow. In Python itself integers can have arbitraty precision, but since pandas/numpy by default use C data types, overflow can happen:

enter link description here

In order to solve the issue you might want to manually cast the data to Python integers:

def f(seq):
    '''Look up https://projecteuler.net/problem=739'''
    
    df = pd.Series(seq).astype('int') # Casting to Python integer type
    
    for i in range(len(seq)-1):
        df = df.iloc[1:].cumsum()
        
    return df.iloc[0]

This solves overflow issue in my testing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM