简体   繁体   English

Python 中的约束线性回归

[英]Constrained Linear Regression in Python

I have a classic linear regression problem of the form:我有一个经典的线性回归问题,形式如下:

y = X b

where y is a response vector X is a matrix of input variables and b is the vector of fit parameters I am searching for.其中y响应向量X是输入变量矩阵b是我正在搜索的拟合参数向量。

Python provides b = numpy.linalg.lstsq( X , y ) for solving problems of this form. Python 提供了b = numpy.linalg.lstsq( X , y )来解决这种形式的问题。

However, when I use this I tend to get either extremely large or extremely small values for the components of b .但是,当我使用它时,我倾向于为b的组件获得极大或极小的值。

I'd like to perform the same fit, but constrain the values of b between 0 and 255.我想执行相同的拟合,但将b的值限制在 0 到 255 之间。

It looks like scipy.optimize.fmin_slsqp() is an option, but I found it extremely slow for the size of problem I'm interested in ( X is something like 3375 by 1500 and hopefully even larger).看起来scipy.optimize.fmin_slsqp()是一个选项,但我发现它对于我感兴趣的问题的大小来说非常慢( X 3375 by 15003375 by 1500 ,希望更大)。

  1. Are there any other Python options for performing constrained least squares fits?是否有其他 Python 选项可用于执行约束最小二乘拟合?
  2. Or are there python routines for performing Lasso Regression or Ridge Regression or some other regression method which penalizes large b coefficient values?或者是否有用于执行套索回归或岭回归或其他一些惩罚大b数值的回归方法的 Python 例程?

You mention you would find Lasso Regression or Ridge Regression acceptable.你提到你会发现套索回归或岭回归是可以接受的。 These and many other constrained linear models are available in the scikit-learn package.这些和许多其他约束线性模型在scikit-learn包中可用。 Check out the section on generalized linear models .查看广义线性模型部分

Usually constraining the coefficients involves some kind of regularization parameter (C or alpha)---some of the models (the ones ending in CV) can use cross validation to automatically set these parameters.通常约束系数涉及某种正则化参数(C 或 alpha)——一些模型(以 CV 结尾的模型)可以使用交叉验证来自动设置这些参数。 You can also further constrain models to use only positive coefficents---for example, there is an option for this on the Lasso model.您还可以进一步限制模型仅使用正系数——例如,在套索模型上有一个选项。

scipy-optimize-leastsq-with-bound-constraints on SO gives leastsq_bounds, which is scipy leastsq + bound constraints such as 0 <= x_i <= 255. SO 上的 scipy-optimize-leastsq-with-bound-constraints给出了 leastsq_bounds,这是scipy leastsq + bound 约束,例如 0 <= x_i <= 255。
(Scipy leastsq wraps MINPACK, one of several implementations of the widely-used Levenberg–Marquardt algorithm aka damped least-squares. (Scipy leastsq 包装了 MINPACK,这是广泛使用的Levenberg-Marquardt 算法又名阻尼最小二乘法的几种实现之一。
There are various ways of implementing bounds;有多种实现边界的方法; leastsq_bounds is I think the simplest.) leastsq_bounds 是我认为最简单的。)

As @conradlee says, you can find Lasso and Ridge Regression implementations in the scikit-learn package.正如@conradlee 所说,您可以在scikit-learn包中找到套索和岭回归实现。 These regressors serve your purpose if you just want your fit parameters to be small or positive.如果您只想让拟合参数变小或为正,这些回归量就可以满足您的目的。

However, if you want to impose any other range as a bound for the fit parameters, you can build your own constrained Regressor with the same package.但是,如果您想将任何其他范围作为拟合参数的界限,您可以使用相同的包构建自己的约束回归量。 See the answer by David Dale to this question for an example.有关示例,请参阅 David Dale 对此问题的回答。

I recently prepared some tutorials on Linear Regression in Python .我最近准备了一些关于 Python 中线性回归的教程 Here is one of the options (Gekko) that includes constraints on the coefficients.这是包括对系数的约束的选项之一 (Gekko)。

# Constrained Multiple Linear Regression
import numpy as np
nd = 100 # number of data sets
nc = 5   # number of inputs
x = np.random.rand(nd,nc)
y = np.random.rand(nd)

from gekko import GEKKO
m = GEKKO(remote=False); m.options.IMODE=2
c  = m.Array(m.FV,nc+1)
for ci in c:
    ci.STATUS=1
    ci.LOWER = -10
    ci.UPPER =  10
xd = m.Array(m.Param,nc)
for i in range(nc):
    xd[i].value = x[:,i]
yd = m.Param(y); yp = m.Var()
s =  m.sum([c[i]*xd[i] for i in range(nc)])
m.Equation(yp==s+c[-1])
m.Minimize((yd-yp)**2)
m.solve(disp=True)
a = [c[i].value[0] for i in range(nc+1)]
print('Solve time: ' + str(m.options.SOLVETIME))
print('Coefficients: ' + str(a))

It uses the nonlinear solver IPOPT to solve the problem that is better than the scipy.optimize.minimize solver.它使用非线性求解器IPOPT来解决比scipy.optimize.minimize求解器更好的问题。 There are other constrained optimization methods in Python as well as discussed in Is there a high quality nonlinear programming solver for Python? Python 中还有其他约束优化方法,并在是否有适用于 Python 的高质量非线性编程求解器中讨论过 . .

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM