[英]Read/Write NumPy Structured Array very slow, linear in size slow
To my surprise I have discovered, that reading from and writing to NumPy Structured arrays seems to be linear in size of the array. 令我惊讶的是,我发现,对NumPy Structured数组的读取和写入似乎与数组的大小呈线性关系。
As this seems very wrong, I would like to know, if I do something wrong here or if there might be a bug. 由于这似乎非常错误,我想知道,如果我在这里做错了或者可能有错误。
Here is some example code: 这是一些示例代码:
def test():
A = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,100))])
B = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,10000))])
C = [{'a':0, 'b':[0 for i in xrange(100)]}]
D = [{'a':0, 'b':[0 for i in xrange(10000)]}]
for i in range(100):
A[0]['a'] = 1
B[0]['a'] = 1
B['a'][0] = 1
x = A[0]['a']
x = B[0]['a']
C[0]['a'] = 1
D[0]['a'] = 1
Line Profiling gives the following results: 线性分析给出以下结果:
Total time: 5.28901 s, Timer unit: 1e-06 s
Function: test at line 454
Line # Hits Time Per Hit % Time Line Contents
==============================================================
454 @profile
455 def test():
456
457 1 10 10.0 0.0 A = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,100))])
458 1 13 13.0 0.0 B = np.zeros(1, dtype=[('a', np.int16), ('b', np.int16, (1,10000))])
459
460 101 39 0.4 0.0 C = [{'a':0, 'b':[0 for i in xrange(100)]}]
461 10001 3496 0.3 0.1 D = [{'a':0, 'b':[0 for i in xrange(10000)]}]
462
463 101 54 0.5 0.0 for i in range(100):
464 100 20739 207.4 0.4 A[0]['a'] = 1
465 100 1741699 17417.0 32.9 B[0]['a'] = 1
466
467 100 1742374 17423.7 32.9 B['a'][0] = 1
468 100 20750 207.5 0.4 x = A[0]['a']
469 100 1759634 17596.3 33.3 x = B[0]['a']
470
471 100 123 1.2 0.0 C[0]['a'] = 1
472 100 76 0.8 0.0 D[0]['a'] = 1
As you can see, I don't even access the larger array (although a size of 10.000 is actually really tiny..). 正如你所看到的,我甚至没有访问更大的阵列(尽管10.000的大小实际上非常小......)。 BTW: Same behavior for shape=(10000,1) instead of (1,10000). BTW:shape =(10000,1)而不是(1,10000)的行为相同。
Any Ideas? 有任何想法吗?
Interpreting a structured array as a list of dicts, and comparing to built-in functions, there is the expected computational cost independent of size (see C and D) 将结构化数组解释为dicts列表,并与内置函数进行比较,预期的计算成本与大小无关(参见C和D)
NumPy Ver. NumPy Ver。 1.10.1. 1.10.1。
This is a known issue with structured arrays on NumPy 1.10.1. 这是NumPy 1.10.1上结构化数组的已知问题 。 The conversation in the issue log seems to indicate it's fixed on all more recent NumPy versions, including 1.10.2 and 1.11.0. 问题日志中的对话似乎表明它已修复所有更新的NumPy版本,包括1.10.2和1.11.0。
Updating NumPy should make the problem go away. 更新NumPy会使问题消失。
With timeit
in ipython
I get essentially the same times for A
and B
随着ipython
timeit
, A
和B
时间基本相同
In [30]: timeit A[0]['a']=1
1000000 loops, best of 3: 1.9 µs per loop
In [31]: timeit B[0]['a']=1
1000000 loops, best of 3: 1.87 µs per loop
In [32]: timeit B['a'][0]=1
1000000 loops, best of 3: 554 ns per loop
In [33]: timeit x=A[0]['a']
1000000 loops, best of 3: 1.74 µs per loop
In [34]: timeit x=B[0]['a']
1000000 loops, best of 3: 1.73 µs per loop
Even if I create B
with a 100 records, times don't change 即使我用100条记录创建B
,时间也不会改变
In [39]: timeit B['a']=1 # set 100 values at once
1000000 loops, best of 3: 1.08 µs per loop
In [40]: timeit B['a'][10]=1
1000000 loops, best of 3: 540 ns per loop
In [41]: B.shape # 2Mb size
Out[41]: (100,)
Even setting a 10000 values of the 'b' field isn't expensive 即使设置'b'字段的10000值也不昂贵
In [46]: B['b'].shape
Out[46]: (100, 1, 10000)
In [47]: B['b'][:,:,:100]=1
In [48]: timeit B['b'][:,:,:100]=1
100000 loops, best of 3: 10.7 µs per loop
In [49]: B['b'].sum()
Out[49]: 10000
In [50]: np.__version__
Out[50]: '1.11.0'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.