简体   繁体   English

在Python / Numpy中优化许多Matrix操作

[英]Optimizing Many Matrix Operations in Python / Numpy

In writing some numerical analysis code, I have bottle-necked at a function that requires many Numpy calls. 在编写一些数值分析代码时,我对需要许多Numpy调用的函数进行了瓶颈处理。 I am not entirely sure how to approach further performance optimization. 我不完全确定如何进一步进行性能优化。

Problem: 问题:

The function determines error by calculating the following, 该函数通过计算以下内容来确定错误,

错误功能

Code: 码:

def foo(B_Mat, A_Mat):
    Temp = np.absolute(B_Mat)
    Temp /= np.amax(Temp)
    return np.sqrt(np.sum(np.absolute(A_Mat - Temp*Temp))) / B_Mat.shape[0]

What would be the best way to squeeze some extra performance out of the code? 什么是从代码中挤出一些额外性能的最佳方法? Would my best course of action be performing the majority of the operations in a single for loop with Cython to cut down on the temporary arrays? 我最好的行动方案是使用Cython在单个for循环中执行大部分操作来减少临时数组吗?

There are specific functions from the implementation that could be off-loaded to numexpr module which is known to be very efficient for arithmetic computations. 实现中存在可以卸载到numexpr模块的特定函数,已知这些函数对于算术计算非常有效。 For our case, specifically we could perform squaring, summation and absolute computations with it. 对于我们的情况,具体来说我们可以用它来执行平方,求和和绝对计算。 Thus, a numexpr based solution to replace the last step in the original code, would be like so - 因此,基于numexpr的解决方案将取代原始代码中的最后一步,就像这样 -

import numexpr as ne

out = np.sqrt(ne.evaluate('sum(abs(A_Mat - Temp**2))'))/B_Mat.shape[0]

A further performance boost could be achieved by embedding the normalization step into the numexpr 's evaluate expression. 通过将规范化步骤嵌入到numexpr的evaluate表达式中,可以实现进一步的性能提升。 Thus, the entire function modified to use numexpr would be - 因此,修改为使用numexpr的整个函数将是 -

def numexpr_app1(B_Mat, A_Mat):
    Temp = np.absolute(B_Mat)
    M = np.amax(Temp)
    return np.sqrt(ne.evaluate('sum(abs(A_Mat*M**2-Temp**2))'))/(M*B_Mat.shape[0])

Runtime test - 运行时测试 -

In [198]: # Random arrays
     ...: A_Mat = np.random.randn(4000,5000)
     ...: B_Mat = np.random.randn(4000,5000)
     ...: 

In [199]: np.allclose(foo(B_Mat, A_Mat),numexpr_app1(B_Mat, A_Mat))
Out[199]: True

In [200]: %timeit foo(B_Mat, A_Mat)
1 loops, best of 3: 891 ms per loop

In [201]: %timeit numexpr_app1(B_Mat, A_Mat)
1 loops, best of 3: 400 ms per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM