If I sum through an array of 0
and 1
, I get a different result doing the same thing through numpy array. Why is that happening and what is the solution? The code is given below:
vl_2=vl_1=0
string_1="00001000100111000010001001100001000100110000100010011000010001011100001"
sb=string_1
table = bytearray.maketrans(b'01', b'\x00\x01')
X = bytearray(sb, "ascii").translate(table)
Y=2.**(np.nonzero(X)[0]+1)#X=np.nonzero(sb)[0]
for i in range(len(sb)):
vl_1 = vl_1+X[i]*2**(i+1)
for y in np.nditer(Y) :
vl_2=vl_2+y
Note that I am doing the same math operation I both loop and so vl_2==vl_1
should be True
, but I get False
.
Edit:
The loop over np.nditer(Y)
is using scientific notations that throws off the calculations a little bit. I changed the loop a little bit
vl_2_2 = 0
for y in np.nditer(Y):
vl_2 = vl_2 + y
vl_2_2 = vl_2_2 + int(y.item())
print(f'{vl_2} {int(vl_2)} {vl_2_2}')
vl_2
is the original
vl_2_2
is doing the calculations after converting y
to an int
In the printout I also print vl_2
as an int
after the calculation.
The results are the same in both loops up to the point of the conversion to scientific notations
First loop (without duplicates):
32
544
4640
12832
29216
553504
8942112
76050976
210268704
4505236000
73224712736
622980526624
1722492154400
36906864243232
599856817664544
5103456445035040
14110655699776032
302341031851487776
4914027050278875680
23360771123988427296
60254259271407530528
134041235566245736992
2495224477001068343840
Second loop (look at the first number for the original)
32.0 32 32
544.0 544 544
4640.0 4640 4640
12832.0 12832 12832
29216.0 29216 29216
553504.0 553504 553504
8942112.0 8942112 8942112
76050976.0 76050976 76050976
210268704.0 210268704 210268704
4505236000.0 4505236000 4505236000
73224712736.0 73224712736 73224712736
622980526624.0 622980526624 622980526624
1722492154400.0 1722492154400 1722492154400
36906864243232.0 36906864243232 36906864243232
599856817664544.0 599856817664544 599856817664544
5103456445035040.0 5103456445035040 5103456445035040
1.4110655699776032e+16 14110655699776032 14110655699776032
3.0234103185148774e+17 302341031851487744 302341031851487776
4.914027050278875e+18 4914027050278875136 4914027050278875680
2.3360771123988427e+19 23360771123988426752 23360771123988427296
6.025425927140753e+19 60254259271407534080 60254259271407530528
1.3404123556624574e+20 134041235566245740544 134041235566245736992
2.4952244770010683e+21 2495224477001068314624 2495224477001068343840
With your setup - I like to see some values, not just a vague "not the same" claim.
In [70]: Y
Out[70]:
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
3.68934881e+19, 7.37869763e+19, 2.36118324e+21])
In [72]: X
Out[72]: bytearray(b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x01\x01\x01\x00\x00\x00\x00\x01')
In [73]: for i in range(len(sb)):
...: vl_1 = vl_1+X[i]*2**(i+1)
...:
In [74]: vl_1
Out[74]: 2495224477001068343840
In [76]: for y in np.nditer(Y) :
...: vl_2=vl_2+y
...:
In [77]: vl_2
Out[77]: 2.4952244770010683e+21
One is float (after all Y
is float), but otherwise the values are the same (within float precision)
In [78]: vl_1-vl_2
Out[78]: 0.0
nditer
does nothing for you:
In [79]: vl_2=0
...: for y in Y : vl_2=vl_2+y
In [80]: vl_2
Out[80]: 2.4952244770010683e+21
but iterating on arrays is slower. You don't need it
In [81]: np.sum(Y)
Out[81]: 2.4952244770010683e+21
If you replace 2.
with 2
when constructing Y
:
In [95]: 2.**(np.nonzero(X)[0]+1)
Out[95]:
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
3.68934881e+19, 7.37869763e+19, 2.36118324e+21])
In [96]: 2**(np.nonzero(X)[0]+1)
Out[96]:
array([ 32, 512, 4096,
8192, 16384, 524288,
8388608, 67108864, 134217728,
4294967296, 68719476736, 549755813888,
1099511627776, 35184372088832, 562949953421312,
4503599627370496, 9007199254740992, 288230376151711744,
4611686018427387904, 0, 0,
0, 0], dtype=int64)
The second is integer values, but the last 4 are too large for int64
.
Skipping the last part of X
I get the same integer result:
In [100]: sum(2**(np.nonzero(X[:-8])[0]+1))
Out[100]: 4914027050278875680
In [101]: sum([x*2**(i+1) for i,x in enumerate(X[:-8])])
Out[101]: 4914027050278875680
The other answer suggested going with object
dtype. While it may work, it looses most of the speed advantages of working with numeric dtype arrays.
As proposed in the other answer, converting the nonzero
results to object
, produces the large enough Python ints:
In [166]: 2**(np.nonzero(X)[0]+1).astype(object)
Out[166]:
array([32, 512, 4096, 8192, 16384, 524288, 8388608, 67108864, 134217728,
4294967296, 68719476736, 549755813888, 1099511627776,
35184372088832, 562949953421312, 4503599627370496,
9007199254740992, 288230376151711744, 4611686018427387904,
18446744073709551616, 36893488147419103232, 73786976294838206464,
2361183241434822606848], dtype=object)
Some comparative times
In [167]: timeit np.sum(2**(np.nonzero(X)[0]+1).astype(object))
46.5 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
The approximate float approach:
In [168]: timeit np.sum(2.**(np.nonzero(X)[0]+1))
32.3 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
The enumerated list:
In [169]: timeit sum([x*2**(i+1) for i,x in enumerate(X)])
43.1 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
Working with an object dtype array doesn't help, speedwise.
The list version of nonzero_bits is even faster
In [173]: %%timeit
...: nonzero_bits = [i for i, x in enumerate(X) if x != 0]
...: vl = sum(2 ** (i + 1) for i in nonzero_bits)
18.9 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
First, using numpy to still use a for loop is not vectorization and will not improve performance (will be even worse, because of numpy array instanciation overhead).
Second, you're handling very large number, above numpy's native ctypes capacities, but native python int
can handle them, so you need to specify dtype=object
for numpy not to cast types (see https://stackoverflow.com/a/37272717/13636407 ).
Even there, because using dtype=object
, numpy can't vectorize, so there is no performance improvement using numpy, as @hpaulj noticed (see performance tests below).
import numpy as np
def using_list(s):
X = to_bytearray(s)
nonzero_bits = [i for i, x in enumerate(X) if x != 0]
return sum(2 ** (i + 1) for i in nonzero_bits)
def using_numpy(s):
# because large numbers, need to convert to dtype=object
# see https://stackoverflow.com/a/37272717/13636407
X = to_bytearray(s)
nonzero_bits = np.nonzero(X)[0].astype(object)
return np.sum(2 ** (nonzero_bits + 1))
table = bytearray.maketrans(b"01", b"\x00\x01")
def to_bytearray(s):
return bytearray(s, "ascii").translate(table)
Equality check:
s = "00001000100111000010001001100001000100110000100010011000010001011100001"
vl_list = using_list(s)
vl_numpy = using_numpy(s)
assert vl_list == vl_numpy
Performance tests:
>>> %timeit using_list(s)
... %timeit using_numpy(s)
... print()
... %timeit using_list(s * 10)
... %timeit using_numpy(s * 10)
... print()
... %timeit using_list(s * 100)
... %timeit using_numpy(s * 100)
10.1 µs ± 81 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
18.1 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
128 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
104 µs ± 605 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
9.88 ms ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.77 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.