Python - 向量到 function 比在每个元素上调用循环慢。 Numpy 比原生 Python 慢

Question

I am working on code to perform a transient simulation calling a function for each second of the day and eventually whole year.我正在编写代码来执行瞬态模拟，该模拟在一天中的每一秒以及最终全年调用 function。 I thought I found an opportunity to speed up the code by passing a vector of inputs instead of calling a for loop, however, my code runs slower when I do this and I don't understand why.我以为我找到了通过传递输入向量而不是调用 for 循环来加速代码的机会，但是，当我这样做时，我的代码运行速度较慢，我不明白为什么。

I would hope for the vector to be only slightly slower than calling the for loop one time to achieve my targeted speed up.我希望向量只比一次调用 for 循环稍慢一点，以实现我的目标加速。

Can you please help explain and/or solve this issue?你能帮忙解释和/或解决这个问题吗？ I have shown three ways in my sample below which is a simplification of the larger program.我在下面的示例中展示了三种方法，这是对较大程序的简化。

Inside the function is the Eg variable which is currently set to zero. function 内部是当前设置为零的 Eg 变量。 If I do Eg=float(0) or Eg=np.array([0,0,0,0,0]) the code runs slower and I assume this is the same issue as the larger question.如果我执行Eg=float(0)或Eg=np.array([0,0,0,0,0])代码运行速度较慢，我认为这与更大的问题相同。

The results from the code below is:以下代码的结果是：

Execution time for numpy vector is 716.225 ms
Execution time for 'for-loop' 6 calls is 389.87 ms
Execution time for numpy float32 'for-loop' is 3906.9069999999997 ms

Code sample:代码示例：

from datetime import datetime, timedelta
import numpy as np

def Q_Walls( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
    
    Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
    Eg = 0
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/2 / A)
    
    return T_p1, Eout, T2_surf

def Q_Walls_vect( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
    
    Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
    Eg = 0 #np.array([0,0,0,0,0], 'float64')
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/2 / A)
    
    return T_p1, Eout, T2_surf



A= R1= R2= R3= R4= m= cp= np.array([1,1,1,1,1], 'float32')
dt= np.array([1,1,1,1,1], 'float32')
T_inf_inside = np.array([250,250,250,250,250], 'float32')
T_inf_outside = np.array([250.2,250.2,250.2,250.2,250.2], 'float32')
T_p_wall = np.array([250.1,250.1,250.1,250.1,250.1], 'float32')

t_max =87000

begin_time = datetime.now()


for x in np.arange(t_max):
    T_p_wall, Enet_wall, Tinside_surf = Q_Walls_vect(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
    
end_time = (datetime.now() - begin_time)
print(f"Execution time for numpy vector is {end_time.total_seconds()*1000} ms")

A= R1= R2= R3= R4= m= cp= float(1.1)
dt= float(1)
T_inf_inside = float(250.01)
T_p_wall = float(250.1)
T_inf_outside = float(250.2)

begin_time = datetime.now()


for x in np.arange(t_max):
    for j in range(6):
        T_p_wall, Enet_wall, Tinside_surf = Q_Walls(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
    
end_time = (datetime.now() - begin_time)
  
print(f"Execution time for 'for-loop' 6 calls is {end_time.total_seconds()*1000} ms")


A= R1= R2= R3= R4= m= cp= np.float32(1.1)
dt= 1
T_inf_inside = np.float32(250.01)
T_p_wall = np.float32(250.1)
T_inf_outside = np.float32(250.2)

begin_time = datetime.now()


for x in np.arange(t_max):
    for j in range(6):
        T_p_wall, Enet_wall, Tinside_surf = Q_Walls(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
    
end_time = (datetime.now() - begin_time)
print(f"Execution time for numpy float32 'for-loop' is {end_time.total_seconds()*1000} ms")

Answer 1

There are multiple issues occurring in the code:代码中出现了多个问题：

Numpy is quite fast for big arrays but not for very small arrays as creating/allocating/freeing temporary arrays is expensive as well as calling native Numpy functions from the Python interpreter. Numpy is quite fast for big arrays but not for very small arrays as creating/allocating/ freeing temporary arrays is expensive as well as calling native Numpy functions from the Python interpreter.
integer-typed and float32-typed variables are promoted to float64 when you perform such binary operations: [int] BIN_OP [float32] and [float32] BIN_OP [float64] and with a reverse order.当您执行此类二进制操作时，整数类型和 float32 类型的变量将提升为 float64 ： [int] BIN_OP [float32]和[float32] BIN_OP [float64]并且顺序相反。 This causes more temporary arrays to be created and several implicit conversions to be done, making the code significantly slower.这会导致创建更多的临时 arrays 并完成一些隐式转换，从而使代码明显变慢。
CPython loops are very slow because CPython is an interpreter. CPython 循环非常慢，因为 CPython 是一个解释器。

The second point can be fixed using the following example code:可以使用以下示例代码修复第二点：

f32_const_0 = np.float32(0)
f32_const_2 = np.float32(2)

def Q_Walls_float32( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
    Ein = (T_inf_outside - T_p) * A / (R2/f32_const_2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/f32_const_2) # convection and conduction only
    Eg = f32_const_0
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/f32_const_2 / A)
    
    return T_p1, Eout, T2_surf

You can mitigate the cost with Numba (or Cython), but the best is not to use Numpy array for only few elements, or actually to directly do the computation element-wise in Numba so that no a lot of temporary array are created.您可以使用Numba （或 Cython）降低成本，但最好不要仅对少数元素使用 Numpy 数组，或者实际上直接在 Numba 中按元素进行计算，这样就不会创建很多临时数组。

Here is an example of Numba code:这是 Numba 代码的示例：

from datetime import datetime, timedelta
import numpy as np
import numba as nb

A= R1= R2= R3= R4= m= cp= float(1.1)
dt= float(1)
T_inf_inside = float(250.01)
T_p_wall = float(250.1)
T_inf_outside = float(250.2)

@nb.njit(nb.types.UniTuple(nb.float64,3)(nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, 
                                            nb.float64, nb.float64, nb.float64, nb.float64))
def Q_Walls( A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p):
    Ein = (T_inf_outside - T_p) * A / (R2/2 + R3 + R4) # convection and conduction only
    Eout = (T_p - T_inf_inside) * A / (R1 + R2/2) # convection and conduction only
    Eg = 0
    Enet = Eg + Ein - Eout
    T_p1 = (Enet * dt / (m * cp) + T_p) # average bulk temperature of wall after time dt
    T2_surf = (T_p - Eout * R2/2 / A)
    
    return (T_p1, Eout, T2_surf)

@nb.njit(nb.types.UniTuple(nb.float64,3)(nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, nb.float64, 
                                            nb.float64, nb.float64, nb.float64, nb.float64))
def compute_with_numba(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall):
    for x in np.arange(t_max):
        for j in range(6):
            T_p_wall, Enet_wall, Tinside_surf = Q_Walls(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
    return (T_p_wall, Enet_wall, Tinside_surf)

begin_time = datetime.now()

T_p_wall, Enet_wall, Tinside_surf = compute_with_numba(A, R1, R2, R3, R4, m, cp, T_inf_inside, T_inf_outside, dt, T_p_wall)
    
end_time = (datetime.now() - begin_time)
  
print(f"Execution time for 'for-loop' 6 calls is {end_time.total_seconds()*1000} ms")

Here are timing results on my machine:这是我机器上的计时结果：

Initial execution:

Execution time for numpy vector is 758.232 ms
Execution time for 'for-loop' 6 calls is 256.093 ms
Execution time for numpy float32 'for-loop' is 3768.253 ms

----------

Fixed execution (Q_Walls_float32):

Execution time for numpy float32 'for-loop' is 839.016 ms

----------

With Numba (compute_with_numba):

Execution time for 'for-loop' 6 calls is 6.311 ms

Python - 向量到 function 比在每个元素上调用循环慢。 Numpy 比原生 Python 慢

问题描述

1 个解决方案

解决方案1
2 2021-06-15 23:38:21

Python - 向量到 function 比在每个元素上调用循环慢。 Numpy 比原生 Python 慢

问题描述

1 个解决方案

解决方案1 2 2021-06-15 23:38:21

解决方案1
2 2021-06-15 23:38:21