[英]Performance of np.empty, np.zeros and np.ones
I was curious about how much difference it really made to use np.empty
instead of np.zeros
, and also about the difference with respect to np.ones
. 我很好奇它使用
np.empty
而不是np.zeros
多大的不同,以及与np.ones
。 I run this small script to benchmark the time it took for each of these to create a large array: 我运行这个小脚本来测试每个创建一个大型数组所花费的时间:
import numpy as np
from timeit import timeit
N = 10_000_000
dtypes = [np.int8, np.int16, np.int32, np.int64,
np.uint8, np.uint16, np.uint32, np.uint64,
np.float16, np.float32, np.float64]
rep= 100
print(f'{"DType":8s} {"Empty":>10s} {"Zeros":>10s} {"Ones":>10s}')
for dtype in dtypes:
name = dtype.__name__
time_empty = timeit(lambda: np.empty(N, dtype=dtype), number=rep) / rep
time_zeros = timeit(lambda: np.zeros(N, dtype=dtype), number=rep) / rep
time_ones = timeit(lambda: np.ones(N, dtype=dtype), number=rep) / rep
print(f'{name:8s} {time_empty:10.2e} {time_zeros:10.2e} {time_ones:10.2e}')
And obtained the following table as a result: 并获得下表:
DType Empty Zeros Ones
int8 1.39e-04 1.76e-04 5.27e-03
int16 3.72e-04 3.59e-04 1.09e-02
int32 5.85e-04 5.81e-04 2.16e-02
int64 1.28e-03 1.13e-03 3.98e-02
uint8 1.66e-04 1.62e-04 5.22e-03
uint16 2.79e-04 2.82e-04 9.49e-03
uint32 5.65e-04 5.20e-04 1.99e-02
uint64 1.16e-03 1.24e-03 4.18e-02
float16 3.21e-04 2.95e-04 1.06e-02
float32 6.31e-04 6.06e-04 2.32e-02
float64 1.18e-03 1.16e-03 4.85e-02
From this I extract two somewhat surprising conclusions: 从中我提取了两个有些令人惊讶的结论:
np.empty
and np.zeros
, maybe excepting some difference for int8
. np.empty
和np.zeros
的性能几乎没有区别,可能除了int8
一些差异。 I don't understand why this is the case. np.zeros
and np.ones
. np.zeros
和np.ones
之间有很大的不同。 I suspect this has to do with high-performance means for memory zeroing that do not apply to filling a memory area with a constant, but I don't really know how or at what level that works. What is the explanation for these results? 这些结果的解释是什么?
I am using NumPy 1.15.4 and Python 3.6 Anaconda on Windows 10 (with MKL), and I have a Intel Core i7-7700K CPU. 我在Windows 10(使用MKL)上使用NumPy 1.15.4和Python 3.6 Anaconda,我有一个Intel Core i7-7700K CPU。
EDIT: As per a suggestion in the comments, I tried running the benchmark interleaving each individual trial and averaging at the end, but I couldn't see a significant difference in the results. 编辑:根据评论中的建议,我尝试运行基准交错每个单独的试验和最后的平均,但我看不出结果的显着差异。 On a related note, though, I don't know if there are any mechanisms in NumPy to reuse the memory of a just deleted array, which would make the measures unrealistic (although the times do seem to go up with the data type size even for empty arrays).
但是,在相关的说明中,我不知道NumPy中是否有任何机制可以重用刚刚删除的数组的内存,这会使这些措施变得不切实际(尽管时间似乎与数据类型大小相关)对于空数组)。
This should really be a comment but it won't fit. 这应该是一个评论,但它不适合。 Here is a small extension of your script.
这是脚本的一个小扩展。 With some "hand-made" versions of
zeros
and ones
. 有一些“手工制作”的
zeros
和ones
版本。
import numpy as np
from timeit import timeit
N = 10_000_000
dtypes = [np.int8, np.int16, np.int32, np.int64,
np.uint8, np.uint16, np.uint32, np.uint64,
np.float16, np.float32, np.float64]
rep= 100
print(f'{"DType":8s} {"Empty":>10s} {"Zeros":>10s} {"Ones":>10s}')
for dtype in dtypes:
name = dtype.__name__
time_empty = timeit(lambda: np.empty(N, dtype=dtype), number=rep) / rep
time_zeros = timeit(lambda: np.zeros(N, dtype=dtype), number=rep) / rep
time_ones = timeit(lambda: np.ones(N, dtype=dtype), number=rep) / rep
time_full_zeros = timeit(lambda: np.full(N, 0, dtype=dtype), number=rep) / rep
time_full_ones = timeit(lambda: np.full(N, 1, dtype=dtype), number=rep) / rep
time_empty_zeros = timeit(lambda: np.copyto(np.empty(N, dtype=dtype), 0), number=rep) / rep
time_empty_ones = timeit(lambda: np.copyto(np.empty(N, dtype=dtype), 1), number=rep) / rep
print(f'{name:8s} {time_empty:10.2e} {time_zeros:10.2e} {time_ones:10.2e} {time_full_zeros:10.2e} {time_full_ones:10.2e} {time_empty_zeros:10.2e} {time_empty_ones:10.2e} ')
The timings are suggestive. 时间是暗示性的。
DType Empty Zeros Ones
int8 1.37e-06 6.33e-04 5.73e-04 5.76e-04 5.73e-04 6.05e-04 5.82e-04
int16 1.61e-06 1.55e-03 3.54e-03 3.54e-03 3.56e-03 3.54e-03 3.54e-03
int32 7.22e-06 6.99e-06 1.24e-02 1.20e-02 1.25e-02 1.19e-02 1.21e-02
int64 8.26e-06 8.06e-06 2.62e-02 2.64e-02 2.61e-02 2.62e-02 2.62e-02
uint8 1.32e-06 6.30e-04 5.85e-04 5.86e-04 5.77e-04 5.70e-04 5.83e-04
uint16 1.32e-06 1.63e-03 3.61e-03 3.65e-03 4.08e-03 4.08e-03 3.58e-03
uint32 7.08e-06 7.20e-06 1.48e-02 1.41e-02 1.63e-02 1.44e-02 1.32e-02
uint64 7.14e-06 7.13e-06 2.69e-02 2.67e-02 2.82e-02 2.68e-02 2.72e-02
float16 1.31e-06 1.55e-03 3.56e-03 3.79e-03 3.54e-03 3.53e-03 3.55e-03
float32 7.11e-06 6.95e-06 1.36e-02 1.35e-02 1.37e-02 1.35e-02 1.37e-02
float64 7.27e-06 7.33e-06 3.13e-02 3.00e-02 2.75e-02 2.80e-02 2.75e-02
Re zeros
being faster than ones
I seem to remember that as suggested in the comments zeros
indeed uses calloc
which being a system routine with the sole purpose of allocating blocks of zeros is probably good at that. 再
zeros
比快ones
我好像记得,作为意见提出zeros
的确使用calloc
这是一个例行的系统与零分配块的唯一目的就是在那个可能是好的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.