Python：改进for循环的性能，内部函数调用仅取决于循环索引

Question

I am guilty of writing code in python as if it were Fortran. 我在python中编写代码就好像它是Fortran一样。 I am rewriting many parts of a long code already written by myself in Fortran, because i want to extend the code significantly and it is MUCH easier to extend in python for proof of concept work. 我正在重写我自己在Fortran中编写的长代码的很多部分，因为我想要显着扩展代码，并且在python中扩展以便概念工作的证明更容易。 If it can be sped up enough however, I will simply use python I am not actually interested in turning the crank on the programs over and over. 如果它可以加速，但我会简单地使用python我实际上并不感兴趣一遍又一遍地转动程序。 Once an idea is proven to work, I move on to the next problem. 一旦一个想法被证明有效，我就会继续讨论下一个问题。 This is why I wish to work in python. 这就是我希望在python中工作的原因。 Unfortunately, right now, it takes several weeks to run as written in python. 不幸的是，现在，用python编写运行需要几周的时间。 Even an order of magnitude speedup on the following for loop would make it a feasible testing platform. 即使在以下for循环中加速数量级也会使其成为可行的测试平台。

A similar question for the R language has been posed 提出了类似的R语言问题

Improving loop performance with function call inside 通过函数调用改善循环性能

but surprisingly I don't see one for python. 但令人惊讶的是我没有看到一个用于python。 This one is similar but the function in the for loop has dependencies, which mine does not 这一个是类似的，但for循环中的函数具有依赖性，而我的不具有依赖性

Improve performance of a for loop in Python (possibly with numpy or numba) 提高Python中for循环的性能（可能使用numpy或numba）

A huge bottleneck is a single for loop 一个巨大的瓶颈是单循环

Simple Code 简单代码

import numpy as np

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000
energy = np.zeros((nmol),dtype=np.float_)

for i in range(nmol):
    energy[i] = np.where( part != i,function(part,i),0.0) # if i = part, energy = 0.0

speeding up the function itself is another and separate problem. 加速功能本身是另一个独立的问题。 There must be a way to use numpy or another method to run all calls simultaneously 必须有一种方法可以使用numpy或其他方法同时运行所有调用

For example purposes lets say 例如，让我们说

def function(i,j):
    for k in range(100000): # this loop is simply to make the time about a second or 2
        ener = (i + j) * (i * j) # entirely arbitrary and not my real problem
    return ener

In reality my function calls several function that depend on part and "i". 实际上我的函数调用了几个依赖于part和“i”的函数。

The full working example is: 完整的工作示例是：

import numpy as np
import time as time

def function(i,j):
    for k in range(10000): # this loop is simply to make the time about a second or 2
        ener = (i + j) * (i * j) # entirely arbitrary and not my real problem
    return ener

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000
energy = np.zeros((nmol),dtype=np.float_)

start = time.time()
for i in range(nmol):
    energy[i] = np.where( part != i,function(part,i),0.0) # if i = part, energy = 0.0

end = time.time()
print('time: ', end-start)

I am using Python version 3.6. 我使用的是Python 3.6版。 It is imperative that in the loop index "i" does not interact with the index "part". 必须在循环索引“i”中不与索引“part”交互。

Answer 1

I am quite surprised that I am the one to figure out an answer - I am not intentionally answering my own question... I have just been thinking about it on my own for while now. 我很惊讶我是一个想出答案的人 - 我并不是故意回答我自己的问题......我现在一直在思考这个问题。

rather than a for loop from index 0 to nmol, make an integer array from 0 to nmol. 而不是从索引0到nmol的for循环，创建一个从0到nmol的整数数组。 simply call the function by passing the integer array. 只需通过传递整数数组来调用该函数。 Thus an array input receives an array output. 因此，阵列输入接收阵列输出。 I modified the function so that it didn't require the constant "part" 我修改了函数，使它不需要常量“部分”

This vectorized solution is ~27 times faster than the for loop, giving me the order of magnitude I required. 这个矢量化解决方案比for循环快〜27 倍，给出了我所需的数量级。

As the array length of size nmol gets bigger, the speed gains increase, and vice versa. 随着nmol大小的数组长度变大，速度增加，反之亦然。

import numpy as np
import time as time

def function(i):
    for k in range(10000):
        ener = (i + part) + (i * part) # entirely arbitrary and not my real problem
    return ener

part  = 3 # a random index of an array, fixed here for example purposes
nmol = 1000

start = time.time()

part_list = np.arange(0,nmol,1)
part_list = np.delete(part_list,part)  # remove the self index

energy =function(part_list) # calls the function in a vectorized form.

end = time.time()
time2 = end-start

Answer 2

based on your vectorized solution, you can get an extra speedup using pythran 基于矢量化解决方案，您可以使用pythran获得额外的加速

Original code: 原始代码：

import numpy as np

def function(i, part):
    for k in range(10000):
        ener = (i + part) + (i * part)
    return ener

and associated benchmark: 和相关的基准：

python -m timeit -s 'import numpy as np; part  = 3; nmol = 1000; part_list = np.arange(0,nmol,1); part_list = np.delete(part_list, part); from a import function' 'function(part_list, part)'
10 loops, best of 3: 37.3 msec per loop

Then adding a pythran export comment 然后添加pythran export注释

import numpy as np

#pythran export function(int64[], int64)
def function(i, part):
    for k in range(10000):
        ener = (i + part) + (i * part)
    return ener

And compiling the module with: 并编译模块：

pythran a.py

Gives an extra boost: 给予额外的提升：

python -m timeit -s 'import numpy as np; part  = 3; nmol = 1000; part_list = np.arange(0,nmol,1); part_list = np.delete(part_list, part); from a import function' 'function(part_list, part)'
1000000 loops, best of 3: 1.53 usec per loop

Python：改进for循环的性能，内部函数调用仅取决于循环索引

问题描述

Simple Code 简单代码

The full working example is: 完整的工作示例是：

2 个解决方案

解决方案1
1 已采纳 2018-08-05 21:10:25

解决方案2
1 2018-10-28 08:01:21

Python：改进for循环的性能，内部函数调用仅取决于循环索引

问题描述

Simple Code 简单代码

The full working example is: 完整的工作示例是：

2 个解决方案

解决方案1 1 已采纳 2018-08-05 21:10:25

解决方案2 1 2018-10-28 08:01:21

解决方案1
1 已采纳 2018-08-05 21:10:25

解决方案2
1 2018-10-28 08:01:21