简体   繁体   English

np.empty,np.zeros和np.ones的性能

[英]Performance of np.empty, np.zeros and np.ones

I was curious about how much difference it really made to use np.empty instead of np.zeros , and also about the difference with respect to np.ones . 我很好奇它使用np.empty而不是np.zeros多大的不同,以及与np.ones I run this small script to benchmark the time it took for each of these to create a large array: 我运行这个小脚本来测试每个创建一个大型数组所花费的时间:

import numpy as np
from timeit import timeit

N = 10_000_000
dtypes = [np.int8, np.int16, np.int32, np.int64,
          np.uint8, np.uint16, np.uint32, np.uint64,
          np.float16, np.float32, np.float64]
rep= 100
print(f'{"DType":8s} {"Empty":>10s} {"Zeros":>10s} {"Ones":>10s}')
for dtype in dtypes:
    name = dtype.__name__
    time_empty = timeit(lambda: np.empty(N, dtype=dtype), number=rep) / rep
    time_zeros = timeit(lambda: np.zeros(N, dtype=dtype), number=rep) / rep
    time_ones = timeit(lambda: np.ones(N, dtype=dtype), number=rep) / rep
    print(f'{name:8s} {time_empty:10.2e} {time_zeros:10.2e} {time_ones:10.2e}')

And obtained the following table as a result: 并获得下表:

DType         Empty      Zeros       Ones
int8       1.39e-04   1.76e-04   5.27e-03
int16      3.72e-04   3.59e-04   1.09e-02
int32      5.85e-04   5.81e-04   2.16e-02
int64      1.28e-03   1.13e-03   3.98e-02
uint8      1.66e-04   1.62e-04   5.22e-03
uint16     2.79e-04   2.82e-04   9.49e-03
uint32     5.65e-04   5.20e-04   1.99e-02
uint64     1.16e-03   1.24e-03   4.18e-02
float16    3.21e-04   2.95e-04   1.06e-02
float32    6.31e-04   6.06e-04   2.32e-02
float64    1.18e-03   1.16e-03   4.85e-02

From this I extract two somewhat surprising conclusions: 从中我提取了两个有些令人惊讶的结论:

  • There is virtually no difference between the performance of np.empty and np.zeros , maybe excepting some difference for int8 . np.emptynp.zeros的性能几乎没有区别,可能除了int8一些差异。 I don't understand why this is the case. 我不明白为什么会这样。 Creating an empty array is supposed to be faster, and actually I have seen reports of that (eg Speed of np.empty vs np.zeros ). 创建一个空数组应该更快,实际上我已经看到了这个的报告(例如np.empty的速度与np.zeros的速度 )。
  • There is a great difference between np.zeros and np.ones . np.zerosnp.ones之间有很大的不同。 I suspect this has to do with high-performance means for memory zeroing that do not apply to filling a memory area with a constant, but I don't really know how or at what level that works. 我怀疑这与内存归零的高性能方法有关,不适用于用常量填充内存区域,但我真的不知道如何或在什么级别工作。

What is the explanation for these results? 这些结果的解释是什么?

I am using NumPy 1.15.4 and Python 3.6 Anaconda on Windows 10 (with MKL), and I have a Intel Core i7-7700K CPU. 我在Windows 10(使用MKL)上使用NumPy 1.15.4和Python 3.6 Anaconda,我有一个Intel Core i7-7700K CPU。

EDIT: As per a suggestion in the comments, I tried running the benchmark interleaving each individual trial and averaging at the end, but I couldn't see a significant difference in the results. 编辑:根据评论中的建议,我尝试运行基准交错每个单独的试验和最后的平均,但我看不出结果的显着差异。 On a related note, though, I don't know if there are any mechanisms in NumPy to reuse the memory of a just deleted array, which would make the measures unrealistic (although the times do seem to go up with the data type size even for empty arrays). 但是,在相关的说明中,我不知道NumPy中是否有任何机制可以重用刚刚删除的数组的内存,这会使这些措施变得不切实际(尽管时间似乎与数据类型大小相关)对于空数组)。

This should really be a comment but it won't fit. 这应该是一个评论,但它不适合。 Here is a small extension of your script. 这是脚本的一个小扩展。 With some "hand-made" versions of zeros and ones . 有一些“手工制作”的zerosones版本。

import numpy as np
from timeit import timeit

N = 10_000_000
dtypes = [np.int8, np.int16, np.int32, np.int64,
          np.uint8, np.uint16, np.uint32, np.uint64,
          np.float16, np.float32, np.float64]
rep= 100
print(f'{"DType":8s} {"Empty":>10s} {"Zeros":>10s} {"Ones":>10s}')
for dtype in dtypes:
    name = dtype.__name__
    time_empty = timeit(lambda: np.empty(N, dtype=dtype), number=rep) / rep
    time_zeros = timeit(lambda: np.zeros(N, dtype=dtype), number=rep) / rep
    time_ones = timeit(lambda: np.ones(N, dtype=dtype), number=rep) / rep
    time_full_zeros = timeit(lambda: np.full(N, 0, dtype=dtype), number=rep) / rep
    time_full_ones = timeit(lambda: np.full(N, 1, dtype=dtype), number=rep) / rep
    time_empty_zeros = timeit(lambda: np.copyto(np.empty(N, dtype=dtype), 0), number=rep) / rep
    time_empty_ones = timeit(lambda: np.copyto(np.empty(N, dtype=dtype), 1), number=rep) / rep
    print(f'{name:8s} {time_empty:10.2e} {time_zeros:10.2e} {time_ones:10.2e} {time_full_zeros:10.2e} {time_full_ones:10.2e}  {time_empty_zeros:10.2e} {time_empty_ones:10.2e} ')

The timings are suggestive. 时间是暗示性的。

DType         Empty      Zeros       Ones
int8       1.37e-06   6.33e-04   5.73e-04   5.76e-04   5.73e-04    6.05e-04   5.82e-04 
int16      1.61e-06   1.55e-03   3.54e-03   3.54e-03   3.56e-03    3.54e-03   3.54e-03 
int32      7.22e-06   6.99e-06   1.24e-02   1.20e-02   1.25e-02    1.19e-02   1.21e-02 
int64      8.26e-06   8.06e-06   2.62e-02   2.64e-02   2.61e-02    2.62e-02   2.62e-02 
uint8      1.32e-06   6.30e-04   5.85e-04   5.86e-04   5.77e-04    5.70e-04   5.83e-04 
uint16     1.32e-06   1.63e-03   3.61e-03   3.65e-03   4.08e-03    4.08e-03   3.58e-03 
uint32     7.08e-06   7.20e-06   1.48e-02   1.41e-02   1.63e-02    1.44e-02   1.32e-02 
uint64     7.14e-06   7.13e-06   2.69e-02   2.67e-02   2.82e-02    2.68e-02   2.72e-02 
float16    1.31e-06   1.55e-03   3.56e-03   3.79e-03   3.54e-03    3.53e-03   3.55e-03 
float32    7.11e-06   6.95e-06   1.36e-02   1.35e-02   1.37e-02    1.35e-02   1.37e-02 
float64    7.27e-06   7.33e-06   3.13e-02   3.00e-02   2.75e-02    2.80e-02   2.75e-02 

Re zeros being faster than ones I seem to remember that as suggested in the comments zeros indeed uses calloc which being a system routine with the sole purpose of allocating blocks of zeros is probably good at that. zeros比快ones我好像记得,作为意见提出zeros的确使用calloc这是一个例行的系统与零分配块的唯一目的就是在那个可能是好的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM