简体   繁体   中英

Multivariate polynomial regression for python

In extension of: scikit learn coefficients polynomialfeatures

What is a straightforward way of doing multivariate polynomial regression for python?

Say, we have N samples with each 3 features and we have for each sample 40 (may as well be any number, of course, but it is 40 in my case) response variables. We want to make a function that relates the 3 independent variables to the 40 response variables. For this, we train a polynomial model on N-1 of our samples, and estimate the 40 response variables of the remaining one sample. The dimensionalities of independent variable (X) and response variable (y) training and test data:

X_train = [(N-1) * 3], y_train = [(N-1) * 40], X_test = [1 * 3], y_test = [1 * 40]

As I would expect, such an approach should yield:

y = intercept + a x1 + b x1^2 + c x2 + d x2^2 + e x3 + f x3^3 + g x1 x2 + h x1 x3 + i x2 x3

Which is a total of 9 coefficients plus one intercept for every sample to describe the polynomial. If I use the method proposed earlier by David Maust in 2015:

import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import *

model = make_pipeline(PolynomialFeatures(degree=2),LinearRegression())
y_poly = model.fit(X_train,y_train)

coefficients = model.steps[1][1].coef_
intercepts = model.steps[1][1].intercept_

coefficients.shape

[Output: (40, 10)]

For every response variable, it appears we end up with 10 coefficients + one intercept, which is one more coefficient than I would expect. Therefore it is unclear to me what these coefficients mean and how to make up the polynomial that describes our response variable. I really hope StackOverflow could help me out! Hopefully I defined my problem well enough.

As you pointed out there are 9 coefficients and a bias term after the polynomial transformation. However when you pass this N by 10 matrix to sklearn's LinearRegression this is interpreted as a 10 dimensional dataset. In addition, by default, sklearn fits the regression line with an intercept, therefore you have 10 coefficients and one intercept. I think the first coefficient will most likely be 0 though (at least that is what I obtained after testing my answers below with the data from here ).

To get your expected behaviour I think you have two options:

  1. disable the bias term in PolynomialFeatures .

model = make_pipeline(PolynomialFeatures(degree=2,include_bias=False), LinearRegression())

  1. tell LinearRegression not to fit an intercept, and instead your first coefficient (coefficient of the bias term) will be the intercept. In this case your intercept is model.steps[1][1].coef_[0] .

model = make_pipeline(PolynomialFeatures(degree=2), LinearRegression(fit_intercept=False))

I hope this helps! Out of curiosity what is the value you get for model.steps[1][1].coef_[0] ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM