简体   繁体   中英

numpy Array Error: Summing elements gives wrong output

If I sum through an array of 0 and 1 , I get a different result doing the same thing through numpy array. Why is that happening and what is the solution? The code is given below:

vl_2=vl_1=0
string_1="00001000100111000010001001100001000100110000100010011000010001011100001"
sb=string_1
table = bytearray.maketrans(b'01', b'\x00\x01')
X     = bytearray(sb, "ascii").translate(table)
Y=2.**(np.nonzero(X)[0]+1)#X=np.nonzero(sb)[0]
for i in range(len(sb)): 
                    vl_1 = vl_1+X[i]*2**(i+1)
for y in np.nditer(Y)  :
                    vl_2=vl_2+y

Note that I am doing the same math operation I both loop and so vl_2==vl_1 should be True , but I get False .

Edit:

  1. This problem occurred in a vectorized code, so speed is an issue, any solution given should consider that. So, the solution should be related to numpy rather than other time-consuming solution.

The loop over np.nditer(Y) is using scientific notations that throws off the calculations a little bit. I changed the loop a little bit

vl_2_2 = 0
for y in np.nditer(Y):
    vl_2 = vl_2 + y
    vl_2_2 = vl_2_2 + int(y.item())
    print(f'{vl_2} {int(vl_2)} {vl_2_2}')

vl_2 is the original

vl_2_2 is doing the calculations after converting y to an int

In the printout I also print vl_2 as an int after the calculation.

The results are the same in both loops up to the point of the conversion to scientific notations

First loop (without duplicates):

32
544
4640
12832
29216
553504
8942112
76050976
210268704
4505236000
73224712736
622980526624
1722492154400
36906864243232
599856817664544
5103456445035040
14110655699776032
302341031851487776
4914027050278875680
23360771123988427296
60254259271407530528
134041235566245736992
2495224477001068343840

Second loop (look at the first number for the original)

32.0 32 32
544.0 544 544
4640.0 4640 4640
12832.0 12832 12832
29216.0 29216 29216
553504.0 553504 553504
8942112.0 8942112 8942112
76050976.0 76050976 76050976
210268704.0 210268704 210268704
4505236000.0 4505236000 4505236000
73224712736.0 73224712736 73224712736
622980526624.0 622980526624 622980526624
1722492154400.0 1722492154400 1722492154400
36906864243232.0 36906864243232 36906864243232
599856817664544.0 599856817664544 599856817664544
5103456445035040.0 5103456445035040 5103456445035040
1.4110655699776032e+16 14110655699776032 14110655699776032
3.0234103185148774e+17 302341031851487744 302341031851487776
4.914027050278875e+18 4914027050278875136 4914027050278875680
2.3360771123988427e+19 23360771123988426752 23360771123988427296
6.025425927140753e+19 60254259271407534080 60254259271407530528
1.3404123556624574e+20 134041235566245740544 134041235566245736992
2.4952244770010683e+21 2495224477001068314624 2495224477001068343840

With your setup - I like to see some values, not just a vague "not the same" claim.

In [70]: Y
Out[70]: 
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
       1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
       1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
       1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
       9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
       3.68934881e+19, 7.37869763e+19, 2.36118324e+21])


In [72]: X
Out[72]: bytearray(b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x01\x01\x01\x00\x00\x00\x00\x01')

In [73]: for i in range(len(sb)): 
    ...:                     vl_1 = vl_1+X[i]*2**(i+1)
    ...:                     

In [74]: vl_1
Out[74]: 2495224477001068343840


In [76]: for y in np.nditer(Y)  :
    ...:                     vl_2=vl_2+y
    ...:                     

In [77]: vl_2
Out[77]: 2.4952244770010683e+21

One is float (after all Y is float), but otherwise the values are the same (within float precision)

In [78]: vl_1-vl_2
Out[78]: 0.0

nditer does nothing for you:

In [79]: vl_2=0
    ...: for y in Y  : vl_2=vl_2+y

In [80]: vl_2
Out[80]: 2.4952244770010683e+21

but iterating on arrays is slower. You don't need it

In [81]: np.sum(Y)
Out[81]: 2.4952244770010683e+21

edit

If you replace 2. with 2 when constructing Y :

In [95]: 2.**(np.nonzero(X)[0]+1)
Out[95]: 
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
       1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
       1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
       1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
       9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
       3.68934881e+19, 7.37869763e+19, 2.36118324e+21])

In [96]: 2**(np.nonzero(X)[0]+1)
Out[96]: 
array([                 32,                 512,                4096,
                      8192,               16384,              524288,
                   8388608,            67108864,           134217728,
                4294967296,         68719476736,        549755813888,
             1099511627776,      35184372088832,     562949953421312,
          4503599627370496,    9007199254740992,  288230376151711744,
       4611686018427387904,                   0,                   0,
                         0,                   0], dtype=int64)

The second is integer values, but the last 4 are too large for int64 .

Skipping the last part of X I get the same integer result:

In [100]: sum(2**(np.nonzero(X[:-8])[0]+1))
Out[100]: 4914027050278875680

In [101]: sum([x*2**(i+1) for i,x in enumerate(X[:-8])])
Out[101]: 4914027050278875680

The other answer suggested going with object dtype. While it may work, it looses most of the speed advantages of working with numeric dtype arrays.

object speed

As proposed in the other answer, converting the nonzero results to object , produces the large enough Python ints:

In [166]: 2**(np.nonzero(X)[0]+1).astype(object)
Out[166]: 
array([32, 512, 4096, 8192, 16384, 524288, 8388608, 67108864, 134217728,
       4294967296, 68719476736, 549755813888, 1099511627776,
       35184372088832, 562949953421312, 4503599627370496,
       9007199254740992, 288230376151711744, 4611686018427387904,
       18446744073709551616, 36893488147419103232, 73786976294838206464,
       2361183241434822606848], dtype=object)

Some comparative times

In [167]: timeit np.sum(2**(np.nonzero(X)[0]+1).astype(object))
46.5 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The approximate float approach:

In [168]: timeit np.sum(2.**(np.nonzero(X)[0]+1))
32.3 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The enumerated list:

In [169]: timeit sum([x*2**(i+1) for i,x in enumerate(X)])
43.1 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Working with an object dtype array doesn't help, speedwise.

The list version of nonzero_bits is even faster

In [173]: %%timeit
     ...: nonzero_bits = [i for i, x in enumerate(X) if x != 0]
     ...: vl = sum(2 ** (i + 1) for i in nonzero_bits)
18.9 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

First, using numpy to still use a for loop is not vectorization and will not improve performance (will be even worse, because of numpy array instanciation overhead).

Second, you're handling very large number, above numpy's native ctypes capacities, but native python int can handle them, so you need to specify dtype=object for numpy not to cast types (see https://stackoverflow.com/a/37272717/13636407 ).

Even there, because using dtype=object , numpy can't vectorize, so there is no performance improvement using numpy, as @hpaulj noticed (see performance tests below).

import numpy as np

def using_list(s):
    X = to_bytearray(s)
    nonzero_bits = [i for i, x in enumerate(X) if x != 0]
    return sum(2 ** (i + 1) for i in nonzero_bits)

def using_numpy(s):
    # because large numbers, need to convert to dtype=object
    # see https://stackoverflow.com/a/37272717/13636407
    X = to_bytearray(s)
    nonzero_bits = np.nonzero(X)[0].astype(object)
    return np.sum(2 ** (nonzero_bits + 1))

table = bytearray.maketrans(b"01", b"\x00\x01")

def to_bytearray(s):
    return bytearray(s, "ascii").translate(table)

Equality check:

s = "00001000100111000010001001100001000100110000100010011000010001011100001"

vl_list = using_list(s)
vl_numpy = using_numpy(s)

assert vl_list == vl_numpy

Performance tests:

>>> %timeit using_list(s)
... %timeit using_numpy(s)
... print()
... %timeit using_list(s * 10)
... %timeit using_numpy(s * 10)
... print()
... %timeit using_list(s * 100)
... %timeit using_numpy(s * 100)

10.1 µs ± 81 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
18.1 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

128 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
104 µs ± 605 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

9.88 ms ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.77 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM