numpy 数组错误：对元素求和会给出错误的输出

Question

If I sum through an array of 0 and 1 , I get a different result doing the same thing through numpy array.如果我对0和1的数组求和，我会通过 numpy 数组得到不同的结果。 Why is that happening and what is the solution?为什么会发生这种情况，解决方案是什么？ The code is given below:代码如下：

vl_2=vl_1=0
string_1="00001000100111000010001001100001000100110000100010011000010001011100001"
sb=string_1
table = bytearray.maketrans(b'01', b'\x00\x01')
X     = bytearray(sb, "ascii").translate(table)
Y=2.**(np.nonzero(X)[0]+1)#X=np.nonzero(sb)[0]
for i in range(len(sb)): 
                    vl_1 = vl_1+X[i]*2**(i+1)
for y in np.nditer(Y)  :
                    vl_2=vl_2+y

Note that I am doing the same math operation I both loop and so vl_2==vl_1 should be True , but I get False .请注意，我正在做相同的数学运算，我都循环，所以vl_2==vl_1应该是True ，但我得到False 。

Edit:编辑：

This problem occurred in a vectorized code, so speed is an issue, any solution given should consider that.这个问题发生在矢量化代码中，所以速度是一个问题，任何给出的解决方案都应该考虑到这一点。 So, the solution should be related to numpy rather than other time-consuming solution.因此，解决方案应该与 numpy 相关，而不是其他耗时的解决方案。

Answer 1

The loop over np.nditer(Y) is using scientific notations that throws off the calculations a little bit. np.nditer(Y)上的循环使用科学记数法，这会稍微影响计算。 I changed the loop a little bit我稍微改变了循环

vl_2_2 = 0
for y in np.nditer(Y):
    vl_2 = vl_2 + y
    vl_2_2 = vl_2_2 + int(y.item())
    print(f'{vl_2} {int(vl_2)} {vl_2_2}')

vl_2 is the original vl_2是原始的

vl_2_2 is doing the calculations after converting y to an int vl_2_2在将y转换为int后进行计算

In the printout I also print vl_2 as an int after the calculation.在打印输出中，我还在计算后将vl_2打印为int 。

The results are the same in both loops up to the point of the conversion to scientific notations在转换为科学计数法之前，两个循环的结果都是相同的

First loop (without duplicates):第一个循环（没有重复）：

32
544
4640
12832
29216
553504
8942112
76050976
210268704
4505236000
73224712736
622980526624
1722492154400
36906864243232
599856817664544
5103456445035040
14110655699776032
302341031851487776
4914027050278875680
23360771123988427296
60254259271407530528
134041235566245736992
2495224477001068343840

Second loop (look at the first number for the original)第二个循环（查看原始数字的第一个数字）

32.0 32 32
544.0 544 544
4640.0 4640 4640
12832.0 12832 12832
29216.0 29216 29216
553504.0 553504 553504
8942112.0 8942112 8942112
76050976.0 76050976 76050976
210268704.0 210268704 210268704
4505236000.0 4505236000 4505236000
73224712736.0 73224712736 73224712736
622980526624.0 622980526624 622980526624
1722492154400.0 1722492154400 1722492154400
36906864243232.0 36906864243232 36906864243232
599856817664544.0 599856817664544 599856817664544
5103456445035040.0 5103456445035040 5103456445035040
1.4110655699776032e+16 14110655699776032 14110655699776032
3.0234103185148774e+17 302341031851487744 302341031851487776
4.914027050278875e+18 4914027050278875136 4914027050278875680
2.3360771123988427e+19 23360771123988426752 23360771123988427296
6.025425927140753e+19 60254259271407534080 60254259271407530528
1.3404123556624574e+20 134041235566245740544 134041235566245736992
2.4952244770010683e+21 2495224477001068314624 2495224477001068343840

Answer 2

With your setup - I like to see some values, not just a vague "not the same" claim.通过您的设置 - 我喜欢看到一些价值观，而不仅仅是一个模糊的“不一样”的主张。

In [70]: Y
Out[70]: 
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
       1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
       1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
       1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
       9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
       3.68934881e+19, 7.37869763e+19, 2.36118324e+21])


In [72]: X
Out[72]: bytearray(b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x01\x01\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x01\x01\x01\x00\x00\x00\x00\x01')

In [73]: for i in range(len(sb)): 
    ...:                     vl_1 = vl_1+X[i]*2**(i+1)
    ...:                     

In [74]: vl_1
Out[74]: 2495224477001068343840


In [76]: for y in np.nditer(Y)  :
    ...:                     vl_2=vl_2+y
    ...:                     

In [77]: vl_2
Out[77]: 2.4952244770010683e+21

One is float (after all Y is float), but otherwise the values are the same (within float precision)一个是浮点数（毕竟Y是浮点数），否则值相同（在浮点精度内）

In [78]: vl_1-vl_2
Out[78]: 0.0

nditer does nothing for you: nditer对您没有任何帮助：

In [79]: vl_2=0
    ...: for y in Y  : vl_2=vl_2+y

In [80]: vl_2
Out[80]: 2.4952244770010683e+21

but iterating on arrays is slower.但是在数组上迭代比较慢。 You don't need it你不需要它

In [81]: np.sum(Y)
Out[81]: 2.4952244770010683e+21

edit编辑

If you replace 2. with 2 when constructing Y :如果在构造Y时将2.替换为2 ：

In [95]: 2.**(np.nonzero(X)[0]+1)
Out[95]: 
array([3.20000000e+01, 5.12000000e+02, 4.09600000e+03, 8.19200000e+03,
       1.63840000e+04, 5.24288000e+05, 8.38860800e+06, 6.71088640e+07,
       1.34217728e+08, 4.29496730e+09, 6.87194767e+10, 5.49755814e+11,
       1.09951163e+12, 3.51843721e+13, 5.62949953e+14, 4.50359963e+15,
       9.00719925e+15, 2.88230376e+17, 4.61168602e+18, 1.84467441e+19,
       3.68934881e+19, 7.37869763e+19, 2.36118324e+21])

In [96]: 2**(np.nonzero(X)[0]+1)
Out[96]: 
array([                 32,                 512,                4096,
                      8192,               16384,              524288,
                   8388608,            67108864,           134217728,
                4294967296,         68719476736,        549755813888,
             1099511627776,      35184372088832,     562949953421312,
          4503599627370496,    9007199254740992,  288230376151711744,
       4611686018427387904,                   0,                   0,
                         0,                   0], dtype=int64)

The second is integer values, but the last 4 are too large for int64 .第二个是整数值，但最后 4 个对于int64来说太大了。

Skipping the last part of X I get the same integer result:跳过X的最后一部分，我得到相同的整数结果：

In [100]: sum(2**(np.nonzero(X[:-8])[0]+1))
Out[100]: 4914027050278875680

In [101]: sum([x*2**(i+1) for i,x in enumerate(X[:-8])])
Out[101]: 4914027050278875680

The other answer suggested going with object dtype.另一个答案建议使用object dtype。 While it may work, it looses most of the speed advantages of working with numeric dtype arrays.虽然它可能有效，但它失去了使用数字 dtype 数组的大部分速度优势。

object speed物体速度

As proposed in the other answer, converting the nonzero results to object , produces the large enough Python ints:正如另一个答案中所建议的，将nonzero结果转换为object会产生足够大的 Python 整数：

In [166]: 2**(np.nonzero(X)[0]+1).astype(object)
Out[166]: 
array([32, 512, 4096, 8192, 16384, 524288, 8388608, 67108864, 134217728,
       4294967296, 68719476736, 549755813888, 1099511627776,
       35184372088832, 562949953421312, 4503599627370496,
       9007199254740992, 288230376151711744, 4611686018427387904,
       18446744073709551616, 36893488147419103232, 73786976294838206464,
       2361183241434822606848], dtype=object)

Some comparative times一些比较时间

In [167]: timeit np.sum(2**(np.nonzero(X)[0]+1).astype(object))
46.5 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The approximate float approach:近似的浮动方法：

In [168]: timeit np.sum(2.**(np.nonzero(X)[0]+1))
32.3 µs ± 1.12 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The enumerated list:枚举列表：

In [169]: timeit sum([x*2**(i+1) for i,x in enumerate(X)])
43.1 µs ± 1.14 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

Working with an object dtype array doesn't help, speedwise.快速使用对象 dtype 数组并没有帮助。

The list version of nonzero_bits is even faster nonzero_bits 的列表版本更快

In [173]: %%timeit
     ...: nonzero_bits = [i for i, x in enumerate(X) if x != 0]
     ...: vl = sum(2 ** (i + 1) for i in nonzero_bits)
18.9 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

Answer 3

First, using numpy to still use a for loop is not vectorization and will not improve performance (will be even worse, because of numpy array instanciation overhead).首先，使用 numpy 仍然使用 for 循环不是矢量化，不会提高性能（会更糟，因为 numpy 数组实例化开销）。

Second, you're handling very large number, above numpy's native ctypes capacities, but native python int can handle them, so you need to specify dtype=object for numpy not to cast types (see https://stackoverflow.com/a/37272717/13636407 ).其次，您正在处理非常大的数字，高于 numpy 的本机 ctypes 容量，但本机 python int可以处理它们，因此您需要为 numpy 指定dtype=object不强制转换类型（请参阅https://stackoverflow.com/a/ 37272717/13636407 ）。

Even there, because using dtype=object , numpy can't vectorize, so there is no performance improvement using numpy, as @hpaulj noticed (see performance tests below).即使在那里，因为使用dtype=object ，numpy 不能矢量化，所以使用 numpy 没有性能改进，正如@hpaulj 所注意到的（参见下面的性能测试）。

import numpy as np

def using_list(s):
    X = to_bytearray(s)
    nonzero_bits = [i for i, x in enumerate(X) if x != 0]
    return sum(2 ** (i + 1) for i in nonzero_bits)

def using_numpy(s):
    # because large numbers, need to convert to dtype=object
    # see https://stackoverflow.com/a/37272717/13636407
    X = to_bytearray(s)
    nonzero_bits = np.nonzero(X)[0].astype(object)
    return np.sum(2 ** (nonzero_bits + 1))

table = bytearray.maketrans(b"01", b"\x00\x01")

def to_bytearray(s):
    return bytearray(s, "ascii").translate(table)

Equality check:平等检查：

s = "00001000100111000010001001100001000100110000100010011000010001011100001"

vl_list = using_list(s)
vl_numpy = using_numpy(s)

assert vl_list == vl_numpy

Performance tests:性能测试：

>>> %timeit using_list(s)
... %timeit using_numpy(s)
... print()
... %timeit using_list(s * 10)
... %timeit using_numpy(s * 10)
... print()
... %timeit using_list(s * 100)
... %timeit using_numpy(s * 100)

10.1 µs ± 81 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
18.1 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

128 µs ± 700 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
104 µs ± 605 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

9.88 ms ± 14.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.77 ms ± 108 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

numpy 数组错误：对元素求和会给出错误的输出

问题描述

3 个解决方案

解决方案1
0 2022-07-04 13:55:56

解决方案2
0 2022-07-04 14:45:11

edit编辑

object speed物体速度

解决方案3
0 已采纳 2022-07-04 14:47:54

numpy 数组错误：对元素求和会给出错误的输出

问题描述

3 个解决方案

解决方案1 0 2022-07-04 13:55:56

解决方案2 0 2022-07-04 14:45:11

edit编辑

object speed物体速度

解决方案3 0 已采纳 2022-07-04 14:47:54

解决方案1
0 2022-07-04 13:55:56

解决方案2
0 2022-07-04 14:45:11

解决方案3
0 已采纳 2022-07-04 14:47:54