简体   繁体   English

找到最适合目标向量的向量线性组合

[英]Find linear combination of vectors that is the best fit for a target vector

I am trying to find weights across a number of forecasts to give a result that is as close as possible (say, mean squared error) to a known target.我试图在多个预测中找到权重,以给出尽可能接近已知目标的结果(例如,均方误差)。

Here is a simplified example showing three different types of forecast across four data points:这是一个简化的示例,显示了跨四个数据点的三种不同类型的预测:

target = [1.0, 1.02, 1.01, 1.04]  # all approx 1.0
forecasts = [
    [0.9, 0.91, 0.92, 0.91],  # all approx 0.9
    [1.1, 1.11, 1.13, 1.11],  # all approx 1.1
    [1.21, 1.23, 1.21, 1.23]  # all approx 1.2
]

where one forecast is always approximately 0.9, one is always approximately 1.1 and one is always approximately 1.2.其中一项预测始终约为 0.9,一项始终约为 1.1,一项始终约为 1.2。

I'd like an automated way of finding weights of approximately [0.5, 0.5, 0.0] for the three forecasts because averaging the first two forecasts and ignoring the third is very close to the target.我想要一种自动的方法来为三个预测找到大约[0.5, 0.5, 0.0]的权重,因为平均前两个预测并忽略第三个非常接近目标。 Ideally the weights would be constrained to be non-negative and sum to 1.理想情况下,权重将被限制为非负且总和为 1。

I think I need to use some form of linear programming or quadratic programming to do this.我需要使用某种形式的线性规划或二次规划来做到这一点。 I have installed the Python quadprog library , but I'm not sure how to translate this problem into the form that solvers like this require.我已经安装了 Python quadprog library ,但我不确定如何将这个问题转化为求解器需要的形式。 Can anyone point me in the right direction?谁能指出我正确的方向?

If I understand you correctly, you want to model some optimization problem and solve it.如果我理解正确的话,你想要 model 一些优化问题并解决它。 If you are interested in the general case (without any constraints), your problem seems pretty close to the regular least square error problem (which you can solve with scikit-learn for example).如果您对一般情况感兴趣(没有任何限制),您的问题似乎非常接近常规最小二乘误差问题(例如,您可以使用scikit-learn解决)。

I recommend to use cvxpy library for modeling an optimization problem.我建议使用cvxpy库对优化问题进行建模。 It's a convenient way to model a convex optimization problem, and you can choose which solver you want to work in the background.这是解决 model 凸优化问题的便捷方法,您可以选择要在后台运行的求解器。

Expanding cvxpy least square example , by adding the constraints you mentioned:通过添加您提到的约束来扩展cvxpy least square example

# Import packages.
import cvxpy as cp
import numpy as np

# Generate data.
m = 20
n = 15
np.random.seed(1)
A = np.random.randn(m, n)
b = np.random.randn(m)

# Define and solve the CVXPY problem.
x = cp.Variable(n)
cost = cp.sum_squares(A @ x - b)
prob = cp.Problem(cp.Minimize(cost), [x>=0, cp.sum(x)==1])
prob.solve()

# Print result.
print("\nThe optimal value is", prob.value)
print("The optimal x is")
print(x.value)
print("The norm of the residual is ", cp.norm(A @ x - b, p=2).value)

In this example, A (the matrix) is a matrix of all your vector, x (the variable) is the weights, and b is the known target.在此示例中, A (矩阵)是所有向量的矩阵, x (变量)是权重, b是已知目标。

EDIT: example with your data:编辑:您的数据示例:

forecasts = np.array([
    [0.9, 0.91, 0.92, 0.91],
    [1.1, 1.11, 1.13, 1.11],
    [1.21, 1.23, 1.21, 1.23]
])

target = np.array([1.0, 1.02, 1.01, 1.04])
x = cp.Variable(forecasts.shape[0])
cost = cp.sum_squares(forecasts.T @ x - target)
prob = cp.Problem(cp.Minimize(cost), [x >= 0, cp.sum(x) == 1])
prob.solve()
print("\nThe optimal value is", prob.value)
print("The optimal x is")
print(x.value)

Output: Output:

The optimal value is 0.0005306233766233817
The optimal x is
[ 6.52207792e-01 -1.45736370e-24  3.47792208e-01]

results are approximately [0.65, 0, 0.34] which is different from the [0.5, 0.5, 0.0] you mentioned, but that depends on how you define your problem.结果大约[0.65, 0, 0.34]与您提到的[0.5, 0.5, 0.0]不同,但这取决于您如何定义问题。 This is a solution for the least squares error.这是最小二乘误差的解决方案。

We can see this problem as a least squares , which is indeed equivalent to quadratic programming.我们可以把这个问题看成是一个最小二乘,它确实等价于二次规划。 If I understand correctly, the weight vector you are looking for is a convex combination, so in least squares form the problem is:如果我理解正确的话,你正在寻找的权重向量是一个凸组合,所以最小二乘形式的问题是:

minimize  || [w0 w1 w2] * forecasts - target ||^2
    s.t.  w0 >= 0, w1 >= 0, w2 >= 0
          w0 + w1 + w2 == 1

There is a least-squares function you can use out of the box in the qpsolvers package:在 qpsolvers package 中有一个开箱即用的最小二乘法function

import numpy as np
from qpsolvers import solve_ls

target = np.array(target)
forecasts = np.array(forecasts)
w = solve_ls(forecasts.T, target, G=-np.eye(3), h=np.zeros(3), A=np.array([1, 1., 1]), b=np.array([1.]))

You can check in the documentation that the matrices G, h, A and b correspond to the problem above.您可以查看文档,矩阵 G、h、A 和 b 对应于上述问题。 Using quadprog as the backend solver, I get the following solution on my machine:使用quadprog作为后端求解器,我在我的机器上得到以下解决方案:

In [6]: w
Out[6]: array([6.52207792e-01, 9.94041282e-15, 3.47792208e-01])

In [7]: np.dot(w, forecasts)
Out[7]: array([1.00781558, 1.02129351, 1.02085974, 1.02129351])

Which is the same solution as in Roim's answer.这与 Roim 的回答中的解决方案相同。 (CVXPY is indeed a great way to start!) (CVXPY 确实是一个很好的开始方式!)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM