简体   繁体   English

使用 Gekko 的约束多线性回归

[英]Constrained Multi-Linear Regression using Gekko

I have a multilinear regression problem where I have the prior information about the range of the output (dependent variable y) - The prediction must always lie in that range.我有一个多元线性回归问题,其中我有关于 output(因变量 y)范围的先验信息 - 预测必须始终位于该范围内。

I want to find the coefficients (upper and lower bound) of each feature (independent variables) in order to make the linear regression model restricted to the desired range of output.我想找到每个特征(自变量)的系数(上限和下限),以便使线性回归 model 限制在 output 的所需范围内。

For example, I want to apply this solution on sklearn.datasets.load_boston where I know that the price of the house will be in the range of [10, 40] (the actual min and max values of y are [5,50]).例如,我想在 sklearn.datasets.load_boston 上应用这个解决方案,我知道房子的价格将在 [10, 40] 的范围内(y 的实际最小值和最大值是 [5,50] ).

In the following example, my objective is 10 < yp < 40 and based on this I want to find the min and max bound of all the coefficients在下面的示例中,我的目标是10 < yp < 40并且基于此我想找到所有系数的最小和最大界限

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
from gekko import GEKKO

# Loading the dataset
x, y = load_boston(return_X_y=True)

c  = m.Array(m.FV, x.shape[1]+1) #array of parameters and intercept

for ci in c:
    ci.STATUS = 1 #calculate fixed parameter
    ci.LOWER  = 1 #constraint: lower limit
    ci.UPPER  = 0 #constraint: lower limit

#Define variables
xd = m.Array(m.Param,x.shape[1])
for i in range(x.shape[1]):
    xd[i].value = x[i]
yd = m.Param(y); yp = m.Var()

#Equation of Linear Functions
y_pred = c[0]*xd[0]+\
         c[1]*xd[1]+\
         c[2]*xd[2]+\
         c[3]*xd[3]+\
         c[4]*xd[4]+\
         c[5]*xd[5]+\
         c[6]*xd[6]+\
         c[7]*xd[7]+\
         c[8]*xd[8]+\
         c[9]*xd[9]+\
         c[10]*xd[10]+\
         c[11]*xd[11]+\
         c[12]*xd[12]+\
         c[13]

#Inequality Constraints
yp = m.Var(lb=5,ub=40)
m.Equation(yp==y_pred)

#Minimize difference between actual and predicted y
m.Minimize((yd-yp)**2)

#Solve
m.solve(disp=True)

#Retrieve parameter values
a = [i.value[0] for i in c]

print(a) 

The listed code either gives me a solution not found error.列出的代码要么给我一个未找到错误的解决方案。 Can you point out what I'm doing wrong here, or am I missing something important?你能指出我在这里做错了什么,还是我错过了一些重要的事情? Also, how can I get both the Upper and Lower bound of c for all independent variables?另外,如何获得所有自变量的上限和下限 c?

Try IMODE=2 for regression mode.尝试IMODE=2回归模式。 There are a few modifications such as x[:,i] to load the data and ci.LOWER=0 as the lower bounds and ci.UPPER=1 as the upper bound.有一些修改,例如x[:,i]加载数据和ci.LOWER=0作为下限和ci.UPPER=1作为上限。

from sklearn.datasets import load_boston
import numpy as np
import pandas as pd
from gekko import GEKKO
m = GEKKO(remote=False)
# Loading the dataset
x, y = load_boston(return_X_y=True)
n = x.shape[1]
c  = m.Array(m.FV, n+1) #array of parameters and intercept
for ci in c:
    ci.STATUS = 1 #calculate fixed parameter
    ci.LOWER  = 0 #constraint: lower limit
    ci.UPPER  = 1 #constraint: lower limit
#Load data
xd = m.Array(m.Param,n)
for i in range(n):
    xd[i].value = x[:,i]
yd = m.Param(y)
#Equation of Linear Functions
yp = m.Var(lb=5,ub=40)
m.Equation(yp==m.sum([c[i]*xd[i] \
               for i in range(n)]) + c[13])
#Minimize difference between actual and predicted y
m.Minimize((yd-yp)**2)
#Regression mode
m.options.IMODE=2
#APOPT solver
m.options.SOLVER = 1
#Solve
m.solve(disp=True)
#Retrieve parameter values
a = [i.value[0] for i in c]
print(a) 

The solution to this constrained problem is obtained with the IPOPT or APOPT (slightly faster).此约束问题的解是通过 IPOPT 或 APOPT(速度稍快)获得的。

 Number of state variables:    7604
 Number of total equations: -  7590
 Number of slack variables: -  0
 ---------------------------------------
 Degrees of freedom       :    14
 
 ----------------------------------------------
 Model Parameter Estimation with APOPT Solver
 ----------------------------------------------
 
 Iter    Objective  Convergence
    0  5.60363E+04  1.25000E+00
    1  1.87753E+05  6.66134E-16
    2  3.06630E+04  9.99201E-16
    3  3.06630E+04  3.84187E-16
    5  3.06630E+04  1.35308E-16
 Successful solution
 
 ---------------------------------------------------
 Solver         :  APOPT (v1.0)
 Solution time  :  0.48429999999999995 sec
 Objective      :  30662.976306748842
 Successful solution
 ---------------------------------------------------
 

[0.0, 0.11336851243, 0.0, 1.0, 0.43491543768, 
 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.037613878067,
 0.0, 1.0]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM