如何更快地优化 Python 代码中的均方误差

Question

(I'm new to stack overflow, but I will try to write my problem the best way I can) For my thesis, I need to do the optimization for a mean squares error problem as fast as possible. （我是堆栈溢出的新手，但我会尽力以最好的方式写出我的问题）对于我的论文，我需要尽快对均方误差问题进行优化。 For this problem, I used to use the scipy.optimize.minimize method (with and without jacobian).对于这个问题，我曾经使用 scipy.optimize.minimize 方法（有和没有 jacobian）。 However;然而; the optimization was still too slow for what we wanted to do.对于我们想要做的事情，优化仍然太慢。 (This program is running on mac with python 3.9) （此程序运行在 python 3.9 的 mac 上）

So first, this is the function to minimize (I already tried to simplify the formula, but it didn't change the speed of the program所以首先，这是 function 最小化（我已经尝试简化公式，但它并没有改变程序的速度

       def _residuals_mse(self, coef, unshimmed_vec, coil_mat, factor):
        """ Objective function to minimize the mean squared error (MSE)

        Args:
            coef (numpy.ndarray): 1D array of channel coefficients
            unshimmed_vec (numpy.ndarray): 1D flattened array (point) 
            coil_mat (numpy.ndarray): 2D flattened array (point, channel) of masked coils
                                      (axis 0 must align with unshimmed_vec)
            factor (float): Devise the result by 'factor'. This allows to scale the output for the minimize function to avoid positive directional linesearch

        Returns:
            scalar: Residual for least squares optimization 
        """

        # MSE regularized to minimize currents
        return np.mean((unshimmed_vec + np.sum(coil_mat * coef, axis=1, keepdims=False)) ** 2) / factor + \ (self.reg_factor * np.mean(np.abs(coef) / self.reg_factor_channel))

This is the jacobian of the function ( There is maybe a way to make it faster but I didn't succeed to do it)这是 function 的雅可比矩阵（也许有办法让它更快，但我没有成功）

     def _residuals_mse_jacobian(self, coef, unshimmed_vec, coil_mat, factor):
        """ Jacobian of the function that we want to minimize, note that normally b is calculates somewhere else 
        Args:
            coef (numpy.ndarray): 1D array of channel coefficients
            unshimmed_vec (numpy.ndarray): 1D flattened array (point) of the masked unshimmed map
            coil_mat (numpy.ndarray): 2D flattened array (point, channel) of masked coils
                                      (axis 0 must align with unshimmed_vec)
            factor (float): integer

        Returns:
            jacobian (numpy.ndarray) : 1D array of the gradient of the mse function to minimize
        """
        b = (2 / (unshimmed_vec.size * factor))
        jacobian = np.array([
            self.b * np.sum((unshimmed_vec + np.matmul(coil_mat, coef)) *            coil_mat[:, j]) + \
            np.sign(coef[j]) * (self.reg_factor / (9 * self.reg_factor_channel[j]))
            for j in range(coef.size)
        ])

        return jacobian

And so this is the "main" program所以这是“主要”程序

import numpy as np 
import scipy.optimize as opt
from numpy.random import default_rng
rand = default_rng(seed=0)
reg_factor_channel = rand.integers(1, 10, size=9) 
coef = np.zeros(9)
unshimmed_vec = np.random.randint(100, size=(150))
coil_mat = np.random.randint(100, size=(150,9))
factor = 2 
self.reg_factor = 5 
currents_sp = opt.minimize(self.residuals_mse, coef,
                               args=(unshimmed_vec, coil_mat, factor),
                               method='SLSQP',
                               jac = self._residuals_mse_jacobian,
                               options={'maxiter': 1000})

On my computer, the optimization takes around 40 ms for a dataset of this size.在我的计算机上，对于这种大小的数据集，优化大约需要 40 毫秒。

The matrices in the example are usually obtained after some modifications and can be way way bigger, but here to make it clear and easy to test, I choose some arbitrary ones.示例中的矩阵通常是经过一些修改后得到的，可以更大，但这里为了清楚和易于测试，我选择了一些任意的。 In addition, this optimization is done many times (Sometimes up to 50 times), so, we are already doing multiprocessing (To do different optimization at the same time).此外，这种优化进行了多次（有时高达 50 次），因此，我们已经在进行多处理（同时进行不同的优化）。 However on mac, mp is slow to start because of the spawning method (because fork is not stable on python 3.9).但是在 mac 上，mp 启动缓慢，因为生成方法（因为 fork 在 python 3.9 上不稳定）。 For this reason, I am trying to make the optimization as fast as possible to maybe remove multiprocessing.出于这个原因，我正在尝试尽可能快地进行优化以消除多处理。

Do any of you know how to make this code faster in python?你们中有人知道如何在 python 中使这段代码更快吗？ Also, this code will be available in open source for users, so I can only free solver (unlike MOSEK)此外，此代码将以开源形式提供给用户，因此我只能免费求解器（与 MOSEK 不同）

Edit: I tried to run the code by using the CVXPY model, with this code after the one just above:编辑：我尝试使用 CVXPY model 运行代码，在上面的代码之后使用此代码：

        m = currents_0.size
        n = unshimmed_vec.size
        coef = cp.Variable(m)
        unshimmed_vec2 = cp.Parameter((n))
        coil_mat2 = cp.Parameter((n,m))
        unshimmed_vec2.value = unshimmed_vec
        coil_mat2.value = coil_mat

        x1 = unshimmed_vec2 + cp.matmul(coil_mat2,coef)
        x2 = cp.sum_squares(x1) / (factor*n)
        x3 = self.reg_factor / self.reg_factor_channel@ cp.abs(coef) / m
        obj = cp.Minimize(x2 + x3)
        prob = cp.Problem(obj)

        prob.solve(solver=SCS)

However, this is slowing even more my code, and it gives me a different value than with scipy.optimize.minimize, so does anyone see a problem in this code?然而，这让我的代码变慢了，它给了我一个与 scipy.optimize.minimize 不同的值，所以有人看到这段代码有问题吗？

Answer 1

I would suggest trying the library NLOpt.我建议尝试图书馆 NLOpt。 It also has SLSQP as nonlinear solver (among many others), and I found it to be faster in many instances than SciPy optimize.它还将 SLSQP 作为非线性求解器（以及许多其他求解器），我发现它在许多情况下比 SciPy 优化更快。

However , you're talking 50 ms per run, you won't get down to 5 ms.但是，你说的是每次运行 50 毫秒，你不会减少到 5 毫秒。

If you're looking to squeeze as much performance as possible, I would probably go to the metal and re-implement the objective function and Jacobian in Fortran (or C) and then use f2py (or Cython) to bridge them to Python. Looks a bit of an overkill to me though.如果你想尽可能多地发挥性能，我可能会 go 到金属并在 Fortran（或 C）中重新实现目标 function 和 Jacobian，然后使用 f2py（或 Cython）将它们桥接到 Python。看起来虽然对我来说有点矫枉过正。

Answer 2

I will make some sweeping assumptions:我将做一些全面的假设：

that we can ignore _criteria_func and instead optimize _residuals_mse ;我们可以忽略_criteria_func而优化_residuals_mse ；
that none of this needs to be in a class;这些都不需要在 class 中；
that, unlike in your example, reg_factor_channel will never have zeros;与您的示例不同， reg_factor_channel永远不会有零； and和
that your bounds and constraints can all be ignored (though you have not made this clear).您的界限和约束都可以忽略（尽管您没有明确说明）。

Recognize that your inner expressions can be simplified:认识到你的内心表达可以被简化：

np.sum(coil_mat * coef) , since it uses a broadcast, is really just a matrix multiplication np.sum(coil_mat * coef) ，因为它使用广播，所以实际上只是一个矩阵乘法
mean(**2) on a vector is really just a self-dot-product divided by the length向量上的mean(**2)实际上只是一个自点积除以长度
Some of your scalar factors and vector coefficients can be combined outside of the function您的一些标量因子和矢量系数可以在 function 之外组合

This leaves us with the following, starting without the Jacobian:这给我们留下了以下内容，从没有雅可比行列式开始：

import numpy as np
from numpy.random import default_rng
from scipy import optimize as opt
from timeit import timeit

rand = default_rng(seed=0)
reg_factor = 5
reg_factor_channel = rand.integers(1, 10, size=9)
reg_vector = reg_factor / len(reg_factor_channel) / reg_factor_channel


def residuals_mse(
    coef: np.ndarray,
    unshimmed_vec: np.ndarray,
    coil_mat: np.ndarray,
    factor: float,
) -> float:
    inner = unshimmed_vec + coil_mat@coef
    return inner.dot(inner)/len(inner)/factor + np.abs(coef).dot(reg_vector)


def old_residuals_mse(coef, unshimmed_vec, coil_mat, factor):
    return np.mean(
        (unshimmed_vec + np.sum(coil_mat * coef, axis=1, keepdims=False)) ** 2) / factor + (
            reg_factor * np.mean(np.abs(coef) / reg_factor_channel))


def main() -> None:
    unshimmed_vec = rand.integers(100, size=150)
    coil_mat = rand.integers(100, size=(150, 9))
    factor = 2
    args = unshimmed_vec, coil_mat, factor

    currents_sp = None
    def run():
        nonlocal currents_sp
        currents_sp = opt.minimize(
            fun=residuals_mse,
            x0=np.zeros_like(reg_factor_channel),
            args=args,
            method='SLSQP',
        )

    t = timeit(run, number=1)
    print(currents_sp)
    print(t, 'seconds')

    r_old = old_residuals_mse(currents_sp.x, *args)
    assert np.isclose(r_old, currents_sp.fun)


if __name__ == '__main__':
    main()

with output与 output

 message: Optimization terminated successfully
 success: True
  status: 0
     fun: 435.166150155064
       x: [-1.546e-01 -8.305e-02 -1.637e-01 -1.106e-01 -1.033e-01
           -8.792e-02 -9.908e-02 -8.666e-02 -1.217e-01]
     nit: 7
     jac: [-1.179e-01 -1.621e-01 -1.112e-01 -1.765e-01 -1.678e-01
           -1.570e-01 -1.456e-01 -1.722e-01 -1.299e-01]
    nfev: 94
    njev: 7
0.012324300012551248 seconds

The Jacobian does indeed help, but has been written in a way that is not properly vectorised.雅可比矩阵确实有帮助，但其编写方式未正确矢量化。 Once vectorised it looks like:一旦矢量化它看起来像：

import numpy as np
from numpy.random import default_rng
from scipy import optimize as opt
from timeit import timeit

rand = default_rng(seed=0)
reg_factor = 5
reg_factor_channel = rand.integers(1, 10, size=9)
reg_vector = reg_factor / len(reg_factor_channel) / reg_factor_channel


def residuals_mse(
    coef: np.ndarray,
    unshimmed_vec: np.ndarray,
    coil_mat: np.ndarray,
    factor: float,
) -> float:
    inner = unshimmed_vec + coil_mat@coef
    return inner.dot(inner)/len(inner)/factor + np.abs(coef).dot(reg_vector)


def old_residuals_mse(coef, unshimmed_vec, coil_mat, factor):
    return np.mean(
        (unshimmed_vec + np.sum(coil_mat * coef, axis=1, keepdims=False)) ** 2) / factor + (
            reg_factor * np.mean(np.abs(coef) / reg_factor_channel))


def residuals_mse_jacobian(
    coef: np.ndarray,
    unshimmed_vec: np.ndarray,
    coil_mat: np.ndarray,
    factor: float,
) -> np.ndarray:
    b = 2 / unshimmed_vec.size / factor
    return b * (
        unshimmed_vec + coil_mat@coef
    )@coil_mat + np.sign(coef) * reg_factor/coef.size/reg_factor_channel


def old_residuals_mse_jacobian(coef, unshimmed_vec, coil_mat, factor):
    b = (2 / (unshimmed_vec.size * factor))
    jacobian = np.array([
        b * np.sum((unshimmed_vec + np.matmul(coil_mat, coef)) *            coil_mat[:, j]) +
        np.sign(coef[j]) * (reg_factor / (9 * reg_factor_channel[j]))
        for j in range(coef.size)
    ])

    return jacobian


def main() -> None:
    unshimmed_vec = rand.integers(100, size=150)
    coil_mat = rand.integers(100, size=(150, 9))
    factor = 2
    args = unshimmed_vec, coil_mat, factor

    currents_sp = None
    def run():
        nonlocal currents_sp
        currents_sp = opt.minimize(
            fun=residuals_mse,
            x0=np.zeros_like(reg_factor_channel),
            args=args,
            method='SLSQP',
            jac=residuals_mse_jacobian,
        )

    t = timeit(run, number=1)
    print(currents_sp)
    print(t, 'seconds')

    r_old = old_residuals_mse(currents_sp.x, *args)
    assert np.isclose(r_old, currents_sp.fun)

    j_new = residuals_mse_jacobian(currents_sp.x, *args)
    j_old = old_residuals_mse_jacobian(currents_sp.x, *args)
    assert np.allclose(j_old, j_new)


if __name__ == '__main__':
    main()

 message: Optimization terminated successfully
 success: True
  status: 0
     fun: 435.1661470650057
       x: [-1.546e-01 -8.305e-02 -1.637e-01 -1.106e-01 -1.033e-01
           -8.792e-02 -9.908e-02 -8.666e-02 -1.217e-01]
     nit: 7
     jac: [-4.396e-02 -8.791e-02 -3.385e-02 -9.817e-02 -9.516e-02
           -8.223e-02 -7.154e-02 -9.907e-02 -5.939e-02]
    nfev: 31
    njev: 7
0.005370599974412471 seconds

如何更快地优化 Python 代码中的均方误差

问题描述

2 个解决方案

解决方案1
0 2023-01-17 21:36:17

解决方案2
0 已采纳 2023-01-18 01:31:51

如何更快地优化 Python 代码中的均方误差

问题描述

2 个解决方案

解决方案1 0 2023-01-17 21:36:17

解决方案2 0 已采纳 2023-01-18 01:31:51

解决方案1
0 2023-01-17 21:36:17

解决方案2
0 已采纳 2023-01-18 01:31:51