简体   繁体   English

在单个管道中使用sklearn线性回归和PCA

[英]Using sklearn Linear Regression and PCA in a single Pipeline

I have a Pandas data frame with 20 numeric features and a numeric response column. 我有一个Pandas数据框,有20个数字功能和一个数字响应列。 I would like to first apply PCA to bring the dimensionality down to 10 and then run Linear Regression to predict the numeric response. 我想首先应用PCA将维数降低到10,然后运行线性回归来预测数值响应。 I can do this currently using two steps 我目前可以使用两个步骤来完成此操作

pipeline = Pipeline([('scaling', StandardScaler()),
                     ('pca', PCA(n_components=20, whiten=True))])
newDF = pipeline.fit_transform(numericDF)

Y = df["Response"]
model = LinearRegression()
model.fit(newDF, Y)

Is there a way to combine Linear Regression in the above pipeline? 有没有办法在上面的管道中组合线性回归? I ask this question because 我问这个问题是因为

  1. fit_transform is not supported in Linear Regression. 线性回归不支持fit_transform
  2. fit_predict can't be used with PCA. fit_predict不能与PCA一起使用。
  3. It's not a one-off use case 这不是一次性用例

How could I run PCA and then Linear Regression all in the same pipeline? 我怎么能在同一个管道中运行PCA然后运行线性回归?

The order of the pipeline steps matters. 管道步骤的顺序很重要。 The last step might implement predict() , while all the previous must have fit_transform() . 最后一步可能实现predict() ,而前面的所有步骤都必须有fit_transform() Also logically, you first transform your features and then apply a predictive classification/regression model 从逻辑上讲,您首先转换功能,然后应用预测分类/回归模型

Y = df["Response"]
X=...
pipeline = Pipeline([('scaling', StandardScaler()),
                     ('pca', PCA(n_components=20, whiten=True)),
                      ('regr',LinearRegression())])
newDF = pipeline.fit_predict(numericDF)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM