如何在python中优化矩阵的数学运算

Question

I am trying to reduce the time of a function that performs a serie of calculations with two matrix. 我正在尝试减少使用两个矩阵执行一系列计算的函数的时间。 Searching for this, I've heard of numpy, but I really do not know how apply it to my problem. 寻找这个，我听说过numpy，但我真的不知道如何将它应用于我的问题。 Also, I Think one of the things is making my function slow is having many dots operators (I heard of that in this this page ). 此外，我认为其中一个原因是让我的功能变慢是有很多点操作员（我在这个页面中听说过）。

The math correspond with a factorization for the Quadratic assignment problem: 数学对应于二次分配问题的分解：

QAP分解

My code is: 我的代码是：

    delta = 0
    for k in xrange(self._tam):
        if k != r and k != s:
            delta +=
                self._data.stream_matrix[r][k] \
                * (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) + \
                self._data.stream_matrix[s][k] \
                * (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) + \
                self._data.stream_matrix[k][r] \
                * (self._data.distance_matrix[sol[k]][sol[s]] - self._data.distance_matrix[sol[k]][sol[r]]) + \
                self._data.stream_matrix[k][s] \
                * (self._data.distance_matrix[sol[k]][sol[r]] - self._data.distance_matrix[sol[k]][sol[s]])
    return delta

Running this on a problem of size 20 (Matrix of 20x20) take about 20 segs, the bottleneck is in this function 在大小为20（Matrix为20x20）的问题上运行此操作需要大约20个segs，瓶颈在于此功能

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
303878   15.712    0.000   15.712    0.000 Heuristic.py:66(deltaC)

I tried to apply map to the for loop, but because the loop body isn't a function call, it is not possible. 我试图将map应用于for循环，但因为循环体不是函数调用，所以不可能。

How could I reduce the time? 我怎么能减少时间？

EDIT1 EDIT1

To answer eickenberg comment: 要回答eickenberg的评论：

sol is a permutation, for example [1,2,3,4]. sol是一种排列，例如[1,2,3,4]。 the function is called when I am generating neighbor solutions, so, a neighbor of [1,2,3,4] is [2,1,3,4]. 当我生成邻居解决方案时调用该函数，因此，[1,2,3,4]的邻居是[2,1,3,4]。 I am changing only two positions in the original permutation and then call deltaC , which calculates a factorization of the solution with positions r,s swaped (In the example above r,s = 0,1). 我在原始排列中仅改变两个位置，然后调用deltaC ，它计算具有位置r，s swaped的解的分解（在上面的例子中r，s = 0,1）。 This permutation is made to avoid calculate the entire cost of the neighbor solution. 进行这种排列是为了避免计算邻居解决方案的全部成本。 I suppose I can store the values of sol[k,r,s] in a local variable to avoid looking up its value in each iteration. 我想我可以将sol[k,r,s]的值存储在局部变量中，以避免在每次迭代中查找其值。 I do not know if this is what you was asking in your comment. 我不知道这是你在评论中提出的问题。

EDIT2 EDIT2

A minimal working example: 最小的工作示例：

import random


distance_matrix = [[0, 12, 6, 4], [12, 0, 6, 8], [6, 6, 0, 7], [4, 8, 7, 0]]
stream_matrix = [[0, 3, 8, 3], [3, 0, 2, 4], [8, 2, 0, 5], [3, 4, 5, 0]]

def deltaC(r, s, S=None):
    '''
    Difference between C with values i and j swapped
    '''

    S = [0,1,2,3]

    if S is not None:
        sol = S
    else:
        sol = S

    delta = 0

    sol_r, sol_s = sol[r], sol[s]

    for k in xrange(4):
        if k != r and k != s:
            delta += (stream_matrix[r][k] \
                * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
                stream_matrix[s][k] \
                * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
                stream_matrix[k][r] \
                * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
                stream_matrix[k][s] \
                * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s]))
    return delta


for _ in xrange(303878):
    d = deltaC(random.randint(0,3), random.randint(0,3))
print d

Now I think the better option is use NumPy. 现在我认为更好的选择是使用NumPy。 I tried with Matrix(), but did not improve the performance. 我尝试使用Matrix（），但没有提高性能。

Best solution found 找到最佳解决方案

Well, Finally I was able to reduce the time a bit more combining @TooTone's solution and storing the indexes in a set to avoid the if. 好吧，最后我能够将@ TooTone的解决方案和将索引存储在一个集合中以减少时间，以避免if。 The time has dropped from about 18 seconds to 8 seconds. 时间从大约18秒下降到8秒。 Here is the code: 这是代码：

def deltaC(self, r, s, sol=None):
    delta = 0
    sol = self.S if sol is None else self.S
    sol_r, sol_s = sol[r], sol[s]

    stream_matrix = self._data.stream_matrix
    distance_matrix = self._data.distance_matrix

    indexes = set(xrange(self._tam)) - set([r, s])

    for k in indexes:
        sol_k = sol[k]
        delta += \
            (stream_matrix[r][k] - stream_matrix[s][k]) \
            * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
            + \
            (stream_matrix[k][r] - stream_matrix[k][s]) \
            * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
    return delta

In order to reduce the time even more, I think the best way would be write a module. 为了减少时间，我认为最好的方法是编写一个模块。

Answer 1

In the simple example you've given, with for k in xrange(4): the loop body only executes twice (if r==s ), or three times (if r!=s ) and an initial numpy implementation, below, is slower by a large factor. 在您给出的简单示例中， for k in xrange(4):循环体仅执行两次（如果r==s ），或者三次（如果r!=s ）和初始numpy实现，如下所示：是一个很大的因素。 Numpy is optimized for performing calculations over long vectors and if the vectors are short the overheads can outweigh the benefits. Numpy针对长向量执行计算进行了优化，如果向量很短，则开销可能超过收益。 (And note in this formula, the matrices are being sliced in different dimensions, and indexed non-contiguously, which can only make things more complicated for a vectorizing implementation). （并注意在这个公式中，矩阵被切割成不同的维度，并且非连续索引，这只会使向量化实现更复杂）。

import numpy as np

distance_matrix_np = np.array(distance_matrix)
stream_matrix_np = np.array(stream_matrix)
n = 4

def deltaC_np(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]

    K = np.array([i for i in xrange(n) if i!=r and i!=s])

    return np.sum(
        (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
        *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
        (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
        * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))

In this numpy implementation, rather than a for loop over the elements in K , the operations are applied across all the elements in K within numpy. 在这个numpy实现中，而不是for K元素的for循环，操作将应用于n中的K所有元素。 Also, note that your mathematical expression can be simplified. 另请注意，您的数学表达式可以简化。 Each term in brackets on the left is the negative of the term in brackets on the right. 左侧括号中的每个术语都是右侧括号中的术语的否定。 在此输入图像描述

This applies to your original code too. 这也适用于您的原始代码。 For example, (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]]) is equal to -1 times (self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) , so you were doing unnecessary computation, and your original code can be optimized without using numpy. 例如， (self._data.distance_matrix[sol[s]][sol[k]] - self._data.distance_matrix[sol[r]][sol[k]])等于-1次(self._data.distance_matrix[sol[r]][sol[k]] - self._data.distance_matrix[sol[s]][sol[k]]) ，所以你做了不必要的计算，你的原始代码可以在不使用的情况下进行优化numpy的。

It turns out that the bottleneck in the numpy function is the innocent-looking list comprehension 事实证明，numpy函数的瓶颈是无辜的列表理解

K = np.array([i for i in xrange(n) if i!=r and i!=s])

Once this is replaced with vectorizing code 一旦用矢量化代码替换它

if r==s:
    K=np.arange(n-1)
    K[r:] += 1
else:
    K=np.arange(n-2)
    if r<s:
        K[r:] += 1
        K[s-1:] += 1
    else:
        K[s:] += 1
        K[r-1:] += 1

the numpy function is much faster. numpy函数要快得多。

A graph of run times is shown immediately below (right at the bottom of this answer is the original graph before optimizing the numpy function). 下面紧接着显示运行时间的图表（在此答案的底部右侧是优化numpy函数之前的原始图表）。 You can see that it either makes sense to use your optimized original code or the numpy code, depending on how large the matrix is. 您可以看到使用优化的原始代码或numpy代码是有意义的，具体取决于矩阵的大小。

在此输入图像描述

The full code is below for reference, partly in case someone else can take it further. 完整的代码在下面以供参考，部分原因是其他人可以进一步采取。 (The function deltaC2 is your original code optimized to take account of the way the mathematical expression can be simplified.) （ deltaC2函数是您优化的原始代码，用于考虑数学表达式的简化方式。）

def deltaC(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]
    for k in xrange(n):
        if k != r and k != s:
            delta += \
                stream_matrix[r][k] \
                * (distance_matrix[sol_s][sol[k]] - distance_matrix[sol_r][sol[k]]) + \
                stream_matrix[s][k] \
                * (distance_matrix[sol_r][sol[k]] - distance_matrix[sol_s][sol[k]]) + \
                stream_matrix[k][r] \
                * (distance_matrix[sol[k]][sol_s] - distance_matrix[sol[k]][sol_r]) + \
                stream_matrix[k][s] \
                * (distance_matrix[sol[k]][sol_r] - distance_matrix[sol[k]][sol_s])
    return delta

import numpy as np

def deltaC_np(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]

    if r==s:
        K=np.arange(n-1)
        K[r:] += 1
    else:
        K=np.arange(n-2)
        if r<s:
            K[r:] += 1
            K[s-1:] += 1
        else:
            K[s:] += 1
            K[r-1:] += 1
    #K = np.array([i for i in xrange(n) if i!=r and i!=s]) #TOO SLOW

    return np.sum(
        (stream_matrix_np[r,K] - stream_matrix_np[s,K]) \
        *  (distance_matrix_np[sol_s,sol[K]] - distance_matrix_np[sol_r,sol[K]]) + \
        (stream_matrix_np[K,r] - stream_matrix_np[K,s]) \
        * (distance_matrix_np[sol[K],sol_s] - distance_matrix_np[sol[K],sol_r]))

def deltaC2(r, s, sol):
    delta = 0
    sol_r, sol_s = sol[r], sol[s]
    for k in xrange(n):
        if k != r and k != s:
            sol_k = sol[k]
            delta += \
                (stream_matrix[r][k] - stream_matrix[s][k]) \
                * (distance_matrix[sol_s][sol_k] - distance_matrix[sol_r][sol_k]) \
                + \
                (stream_matrix[k][r] - stream_matrix[k][s]) \
                * (distance_matrix[sol_k][sol_s] - distance_matrix[sol_k][sol_r])
    return delta


import time

N=200

elapsed1s = []
elapsed2s = []
elapsed3s = []
ns = range(10,410,10)
for n in ns:
    distance_matrix_np=np.random.uniform(0,n**2,size=(n,n))
    stream_matrix_np=np.random.uniform(0,n**2,size=(n,n))
    distance_matrix=distance_matrix_np.tolist()
    stream_matrix=stream_matrix_np.tolist()
    sol  = range(n-1,-1,-1)
    sol_np  = np.array(range(n-1,-1,-1))

    Is = np.random.randint(0,n-1,4)
    Js = np.random.randint(0,n-1,4)

    total1 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total1 += deltaC(i,j, sol)
    elapsed1 = (time.clock() - start)
    start = time.clock()

    total2 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total2 += deltaC_np(i,j, sol_np)
    elapsed2 = (time.clock() - start)

    total3 = 0
    start = time.clock()
    for reps in xrange(N):
        for i in Is:
            for j in Js:
                total3 += deltaC2(i,j, sol_np)
    elapsed3 = (time.clock() - start)

    print n, elapsed1, elapsed2, elapsed3, total1, total2, total3
    elapsed1s.append(elapsed1)
    elapsed2s.append(elapsed2)
    elapsed3s.append(elapsed3)

    #Check errors of one method against another
    #err = 0
    #for i in range(min(n,50)):
    #    for j in range(min(n,50)):
    #        err += np.abs(deltaC(i,j,sol)-deltaC_np(i,j,sol_np))
    #print err
import matplotlib.pyplot as plt

plt.plot(ns, elapsed1s, label='Original',lw=2)
plt.plot(ns, elapsed3s, label='Optimized',lw=2)
plt.plot(ns, elapsed2s, label='numpy',lw=2)
plt.legend(loc='upper left', prop={'size':16})
plt.xlabel('matrix size')
plt.ylabel('time')
plt.show()

And here is the original graph before optimizing out the list comprehension in deltaC_np 这是在deltaC_np优化列表理解之前的原始图

在此输入图像描述

如何在python中优化矩阵的数学运算

问题描述

EDIT1 EDIT1

EDIT2 EDIT2

Best solution found 找到最佳解决方案

1 个解决方案

解决方案1
6 已采纳 2014-04-12 23:13:52

如何在python中优化矩阵的数学运算

问题描述

EDIT1 EDIT1

EDIT2 EDIT2

Best solution found 找到最佳解决方案

1 个解决方案

解决方案1 6 已采纳 2014-04-12 23:13:52

解决方案1
6 已采纳 2014-04-12 23:13:52