Scikit学习多项式线性回归和多项式特征的系数顺序

Question

I'm fitting a simple polynomial regression model, and I want get the coefficients from the fitted model. 我正在拟合一个简单的多项式回归模型，我想从拟合的模型中获取系数。

Given the prep code: 给定准备代码：

import pandas as pd
from itertools import product
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

# data creation
sa = [1, 0, 1, 2, 3]
sb = [2, 1, 0, 1, 2]
raw = {'a': [], 'b': [], 'w': []}
for (ai, av), (bi, bv) in product(enumerate(sa), enumerate(sb)):
    raw['a'].append(ai)
    raw['b'].append(bi)
    raw['w'].append(av + bv)
data = pd.DataFrame(raw)

# regression
x = data[['a', 'b']].values
y = data['w']
poly = PolynomialFeatures(2)
linr = LinearRegression()
model = make_pipeline(poly, linr)
model.fit(x, y)

From this answer , I know the coefficients can obtained using with 从这个答案中，我知道使用

model.steps[1][1].coef_
>>> array([  0.00000000e+00,  -5.42857143e-01,  -1.71428571e+00,
             2.85714286e-01,   1.72774835e-16,   4.28571429e-01])

But this provides a 1-dimensional array and I'm not sure which numbers correspond to which variables. 但这提供了一维数组，我不确定哪个数字对应哪个变量。

Are they ordered as a ⁰ , a ¹ , a ² , b ⁰ , b ¹ , b ² or as a ⁰ , b ⁰ , a ¹ , b ¹ , a ² , b ² ? 它们是按⁰ ，a ¹ ，a ² ，b ⁰ ，b ¹ ，b ²排序还是按⁰ ，b ⁰ ，a ¹ ，b ¹ ，a ² ，b ²排序？

Answer 1

You can use the get_feature_names() of the PolynomialFeatures to know the order. 您可以使用PolynomialFeatures的get_feature_names()来了解顺序。

In the pipeline you can do this: 在管道中，您可以执行以下操作：

model.steps[0][1].get_feature_names()

# Output:
['1', 'x0', 'x1', 'x0^2', 'x0 x1', 'x1^2']

If you have the names of the features with you ('a', 'b' in your case), you can pass that to get actual features. 如果您具有要素名称（在您的情况下为“ a”，“ b”），则可以传递该名称以获取实际要素。

model.steps[0][1].get_feature_names(['a', 'b'])

# Output:
['1', 'a', 'b', 'a^2', 'a b', 'b^2']

Answer 2

First, the coefficients of a polynomial of degree 2 are 1, a, b, a^2, ab, and b^2 and they come in this order in the scikit-learn implementation. 首先，次数为2的多项式的系数为1，a，b，a ^ 2，ab和b ^ 2，它们在scikit-learn实现中按此顺序排列。 You can verify this by creating a simple set of inputs, eg 您可以通过创建一组简单的输入来验证这一点，例如

x = np.array([[2, 3], [2, 3], [2, 3]])
print(x)
[[2 3]
 [2 3]
 [2 3]]

And then creating the polynomial features: 然后创建多项式特征：

poly = PolynomialFeatures(2)
x_poly = poly.fit_transform(x)
print(x_poly)
[[1. 2. 3. 4. 6. 9.]
 [1. 2. 3. 4. 6. 9.]
 [1. 2. 3. 4. 6. 9.]]

You can see that the first and second feature are a and b (without counting the bias coefficient 1), the third feature is a^2 (ie 2^2), the fourth is ab=2*3, and the last is b^2=3^2. 您可以看到第一个和第二个特征是a和b（不计算偏差系数1），第三个特征是a ^ 2（即2 ^ 2），第四个特征是ab = 2 * 3，最后一个是b ^ 2 = 3 ^ 2。 ie you model is: 即您的模型是：

Scikit学习多项式线性回归和多项式特征的系数顺序

问题描述

2 个解决方案

解决方案1
3 已采纳 2019-02-25 07:46:52

解决方案2
1 2019-02-25 06:20:10

Scikit学习多项式线性回归和多项式特征的系数顺序

问题描述

2 个解决方案

解决方案1 3 已采纳 2019-02-25 07:46:52

解决方案2 1 2019-02-25 06:20:10

解决方案1
3 已采纳 2019-02-25 07:46:52

解决方案2
1 2019-02-25 06:20:10