提高Python中for循环的性能（可能使用numpy或numba）

Question

I want to improve the performance of the for loop in this function. 我想在这个函数中改进for循环的性能。

import numpy as np
import random

def play_game(row, n=1000000):
    """Play the game! This game is a kind of random walk.

    Arguments:
        row (int[]): row index to use in the p matrix for each step in the
                     walk. Then length of this array is the same as n.

        n (int): number of steps in the random walk
    """
    p = np.array([[ 0.499,  0.499,  0.499],
                  [ 0.099,  0.749,  0.749]])
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0

    for j in range(n):
        tempX = X[j] = tempX + 2 * (random.random() < p.item(row.item(j), Y)) - 1
        Y = tempX % 3

    return np.r_[X0, X]

The difficulty lies in the fact that the value of Y is computed at each step based on the value of X and that Y is then used in the next step to update the value for X . 困难在于Y的值是基于X的值在每个步骤计算的，并且 Y然后在下一步骤中用于更新X的值。

I wonder if there is some numpy trick that could make a big difference. 我想知道是否有一些可以产生重大影响的笨拙技巧。 Using Numba is fair game (I tried it but without much success). 使用Numba是公平的游戏（我试过但没有太大的成功）。 However, I do not want to use Cython. 但是，我不想使用Cython。

Answer 1

A quick oberservation tells us that there is data dependency between iterations in the function code. 快速观察告诉我们功能代码中的迭代之间存在数据依赖性。 Now, there are different kinds of data dependencies. 现在，存在不同种类的数据依赖性。 The kind of data dependency you are looking at is indexing dependency that is data selection at any iteration depends on the previous iteration calculations. 您正在查看的数据依赖类型是索引依赖性，即在任何迭代中的数据选择取决于先前的迭代计算。 This dependency seemed difficult to trace between iterations, so this post isn't really a vectorized solution. 这种依赖似乎很难在迭代之间进行跟踪，因此这篇文章实际上并不是一个矢量化解决方案。 Rather, we would try to pre-compute values that would be used within the loop, as much as possible. 相反，我们会尝试尽可能地预先计算将在循环中使用的值。 The basic idea is to do minimum work inside the loop. 基本思想是在循环内做最小的工作。

Here's a brief explanation on how we can proceed with pre-calculations and thus have a more efficient solution : 以下是我们如何进行预先计算的简要说明，从而提供更有效的解决方案：

Given, the relatively small shape of p from which row elements are to be extracted based on the input row , you can pre-select all those rows from p with p[row] . 给定，基于输入row从中提取行元素的p的相对较小的形状，可以使用p[row]从p预先选择所有那些行。
For each iteration, you are calculating a random number. 对于每次迭代，您都在计算一个随机数。 You can replace this with a random array that you can setup before the loop and thus, you would have precalculated those random values as well. 您可以使用可以在循环之前设置的随机数组替换它，因此，您也可以预先计算这些随机值。
Based on the precalculated values thus far, you would have the column indices for all rows in p . 根据到目前为止的预先计算的值，您将获得p所有行的列索引。 Note that these column indices would be a large ndarray containing all possible column indices and inside our code, only one would be chosen based on per-iteration calculations. 请注意，这些列索引将是包含所有可能列索引的大型ndarray，并且在我们的代码中，只有一个将基于每次迭代计算来选择。 Using the per-iteration column indices, you would increment or decrement X0 to get per-iteration output. 使用每次迭代列索引，您可以递增或递减X0以获得每次迭代输出。

The implementation would look like this - 实现看起来像这样 -

randarr = np.random.rand(n)
p = np.array([[ 0.499,  0.419,  0.639],
              [ 0.099,  0.749,  0.319]])

def play_game_partvect(row,n,randarr,p):

    X0 = 100
    Y0 = X0 % 3

    signvals = 2*(randarr[:,None] < p[row]) - 1
    col_idx = (signvals + np.arange(3)) % 3

    Y = Y0
    currval = X0
    out = np.empty(n+1)
    out[0] = X0
    for j in range(n):
        currval = currval + signvals[j,Y]
        out[j+1] = currval
        Y = col_idx[j,Y]

    return out

For verification against the original code, you would have the original code modified like so - 要对原始代码进行验证，您可以像这样修改原始代码 -

def play_game(row,n,randarr,p):
    X0 = 100
    Y0 = X0 % 3
    X = np.zeros(n)
    tempX = X0
    Y = Y0
    for j in range(n):
        tempX = X[j] = tempX + 2 * (randarr[j] < p.item(row.item(j), Y)) - 1
        Y = tempX % 3
    return np.r_[X0, X]

Please note that since this code precomputes those random values, so this already would give you a good speedup over the code in the question. 请注意，由于此代码预先计算了这些随机值，因此这已经为您提供了比问题中的代码更好的加速。

Runtime tests and output verification - 运行时测试和输出验证 -

In [2]: # Inputs
   ...: n = 1000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [3]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[3]: True

In [4]: %timeit play_game(row,n,randarr,p)
100 loops, best of 3: 11.6 ms per loop

In [5]: %timeit play_game_partvect(row,n,randarr,p)
1000 loops, best of 3: 1.51 ms per loop

In [6]: # Inputs
   ...: n = 10000
   ...: row = np.random.randint(0,2,(n))
   ...: randarr = np.random.rand(n)
   ...: p = np.array([[ 0.499,  0.419,  0.639],
   ...:               [ 0.099,  0.749,  0.319]])
   ...: 

In [7]: np.allclose(play_game_partvect(row,n,randarr,p),play_game(row,n,randarr,p))
Out[7]: True

In [8]: %timeit play_game(row,n,randarr,p)
10 loops, best of 3: 116 ms per loop

In [9]: %timeit play_game_partvect(row,n,randarr,p)
100 loops, best of 3: 14.8 ms per loop

Thus, we are seeing a speedup of about 7.5x+ , not bad! 因此，我们看到加速大约7.5x+ ，不错！

提高Python中for循环的性能（可能使用numpy或numba）

问题描述

1 个解决方案

解决方案1
1 2015-10-16 09:47:24

提高Python中for循环的性能（可能使用numpy或numba）

问题描述

1 个解决方案

解决方案1 1 2015-10-16 09:47:24

解决方案1
1 2015-10-16 09:47:24