简体   繁体   English

为什么 full_like 这么慢(与其他方法相比)?

[英]Why is full_like so slow (compared to other approaches)?

When I was refactoring my code, I noticed that I wrote a line当我重构我的代码时,我注意到我写了一行

    class_proba = np.empty_like(self.classes_, dtype=np.float64)
    class_proba[:] = np.NaN

so I thought, it would be faster to use full_like .所以我想,使用full_like会更快。 But since the first rule of performance is "measure everything" (and not just "don't talk about performance") - I did that, and was quite surprised:但是由于性能的第一条规则是“衡量一切”(而不仅仅是“不要谈论性能”) - 我这样做了,并且非常惊讶:

import numpy as np
import time

t0 = time.time()
for _ in range(500):
   np.full_like(np.array([1,2,3]), np.nan, dtype=np.float64)
print(time.time()-t0)

t0 = time.time()
for _ in range(500):
   class_proba = np.empty_like(np.array([1,2,3]), dtype=np.float64)
   class_proba[:] = np.NaN
print(time.time()-t0)

results in结果是

0.008994579315185547
0.0019996166229248047

I was very surprised.我很惊讶。 Maybe it was the usage of "empty" that made the first part so fast or the wrong type?也许是“空”的使用使第一部分如此之快或类型错误? But even但即使

t0 = time.time()
for _ in range(500):
    class_proba = np.ones_like(np.array([1, 2, 3]), dtype=np.float64)
    class_proba[:] = np.NaN
print(time.time() - t0)

is faster than using full_like with比使用full_like更快

0.003997087478637695

and

t0 = time.time()
for _ in range(500):
   np.full_like(np.array([1,2,3]), np.nan, dtype=float)
print(time.time()-t0)

is (better but) still not as fast with是(更好但)仍然没有那么快

0.0029981136322021484

So my questions is: Why is it so slow to use full_like in this case - compared to the other two approaches?所以我的问题是:与其他两种方法相比,为什么在这种情况下使用full_like这么慢?

Next I tried to make the array much larger:接下来我尝试使数组更大:

    a = np.arange(1000000)

    t0 = time.time()
    for _ in range(500):
        np.full_like(a, np.nan, dtype=np.float64)
    print(time.time() - t0)

    t0 = time.time()
    for _ in range(500):
        np.full_like(a, np.nan, dtype=float)
    print(time.time() - t0)

    t0 = time.time()
    for _ in range(500):
        class_proba = np.empty_like(a, dtype=np.float64)
        class_proba[:] = np.NaN
    print(time.time() - t0)

    t0 = time.time()
    for _ in range(500):
        class_proba = np.ones_like(a, dtype=np.float64)
        class_proba[:] = np.NaN
    print(time.time() - t0)

resulting in导致

1.9398648738861084
1.819936990737915
1.853914499282837
2.292659044265747

Can someone explain this as well?有人也可以解释一下吗?

If the timeit module from the Python standard library is used, the performance does not match what was reported in the OP's question.如果使用来自 Python 标准库的timeit模块,则性能与 OP 问题中报告的不匹配。 np.full_like and np.empty_like... out[:]=nan perform similarly, and np.ones_like... out[:]=nan was slowest. np.full_likenp.empty_like... out[:]=nan执行类似,并且np.ones_like... out[:]=nan最慢。

This makes intuitive sense when considering the number of operations.在考虑操作数量时,这具有直观意义。 All of the methods here use np.empty_like at their core.这里的所有方法都以np.empty_like为核心。 np.ones_like creates an empty array and then overwrites each value with 1 . np.ones_like创建一个空数组,然后用1覆盖每个值。 np.full_like creates an empty array and then overwrites each value with whatever was specified in the np.full_like call. np.full_like创建一个空数组,然后用np.full_like调用中指定的任何内容覆盖每个值。 It makes sense that np.ones_like...out[:]=np.nan is the slowest, because it creates an empty array, fills it with 1 , then fills it with np.nan .有意义的是np.ones_like...out[:]=np.nan是最慢的,因为它创建了一个空数组,用1填充它,然后用np.nan填充它。 On the other hand, np.full_like and np.empty_like...out[:]=np.nan are similar in terms of number of operations, and have almost identical performance.另一方面, np.full_likenp.empty_like...out[:]=np.nan在操作数量方面相似,并且具有几乎相同的性能。

import timeit

timeit.main(
    ["--setup", "import numpy as np; a = np.arange(1000000)",
    "class_proba = np.full_like(a, np.nan, dtype=np.float64)"])

timeit.main(
    ["--setup", "import numpy as np; a = np.arange(1000000)",
    """\
class_proba = np.empty_like(a, dtype=np.float64)
class_proba[:] = np.NaN"""])

timeit.main(
    ["--setup", "import numpy as np; a = np.arange(1000000)",
    """\
class_proba = np.ones_like(a, dtype=np.float64)
class_proba[:] = np.NaN"""])

The output is below: output如下:

1000 loops, best of 3: 616 usec per loop
1000 loops, best of 3: 619 usec per loop
1000 loops, best of 3: 1.23 msec per loop

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 为什么这个主要测试与另一个相比如此缓慢? - Why is this prime test so slow compared to this other one? 为什么遍历 pytorch 张量如此缓慢(与 Numpy 相比)? - Why is looping through pytorch tensors so slow (compared to Numpy)? 为什么Apache-Spark - Python在本地与熊猫相比如此之慢? - Why is Apache-Spark - Python so slow locally as compared to pandas? 与仅执行'IN'查询相比,为什么'executemany'这么慢? - Why is 'executemany' so slow compared to just doing an 'IN' query? 为什么插入排序比其他排序算法那么快? - Why is insertion sort so fast compared to other sorting algorithms? 与其他一些语言相比,为什么 Python JSON 序列化如此“参与” - Why Is Python JSON serialization so “involved” compared to some other languages 与Java或C#中的相同算法相比,为什么在Python中这种主要筛子这么慢? - Why is this prime sieve so slow in Python compared with the same algorithm in Java or C#? 这个python代码有什么问题,为什么它比ruby运行得那么慢? - Is there something wrong with this python code, why does it run so slow compared to ruby? 与MSE相比,为什么用MAE准则训练随机森林回归器这么慢? - Why is training a random forest regressor with MAE criterion so slow compared to MSE? 与常见的词典相比,为什么lil_matrix和dok_matrix如此之慢? - Why are lil_matrix and dok_matrix so slow compared to common dict of dicts?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM