回归与多维目标

Question

I am using scikit-learn to do regression and my problem is the following. 我正在使用scikit-learn做回归，我的问题如下。 I need to do regression on several parameters (vectors). 我需要对几个参数（向量）进行回归。 This works fine with some regression approaches such as ensemble.ExtraTreesRegressor and ensemble.RandomForestRegressor . 这适用于一些回归方法，如ensemble.ExtraTreesRegressor和ensemble.RandomForestRegressor 。 Indeed, one can give a vector of vectors as targets to fit the model ( fit(X,y) method) for the two aforementionned regression methods. 实际上，可以给出矢量矢量作为目标以适合两种上述回归方法的模型（ fit(X,y)方法）。

However when I try with ensemble.GradientBoostingRegressor , ensemble.AdaBoostRegressor and linear_model.SGDRegressor , the classifier fails to fit the model because it expects 1-dimensional values as targets (y argument of the fit(X,y) method). 但是，当我尝试使用ensemble.GradientBoostingRegressor ， ensemble.AdaBoostRegressor和linear_model.SGDRegressor ，分类器无法拟合模型，因为它期望将1维值作为目标（ fit(X,y)方法的y参数）。 This means, with those Regression methods I can estimate only one parameter at a time. 这意味着，使用那些回归方法，我一次只能估计一个参数。 This is not suitable for my problem because it might take some time while I need to estimate about 20 parameters. 这不适合我的问题，因为我需要花一些时间来估计大约20个参数。 On the other hand, I really would like to test those approaches. 另一方面，我真的想测试这些方法。

So, my question is: Does anyone know if there is a solution to fit the model once and estimate several parameters for ensemble.GradientBoostingRegressor , ensemble.AdaBoostRegressor and linear_model.SGDRegressor ? 所以，我的问题是：有没有人知道是否有适合模型的解决方案并估计ensemble.GradientBoostingRegressor ， ensemble.AdaBoostRegressor和linear_model.SGDRegressor几个参数？

I hope I've been clear enough ... 我希望我已经足够清楚......

Answer 1

I interpret that what you have is a problem of multiple multivariate regression . 我解释你所拥有的是多元多元回归的问题。

Not every regression method in scikit-learn can handle this sort of problem and you should consult the documentation of each one to find it out. 并非每个scikit-learn中的回归方法都可以处理这类问题，您应该查阅每个问题的文档以找出它。 In particular, neither SGDRegressor , GradientBoostingRegressor nor AdaBoostRegressor support this at the moment: fit(X, y) specifies X : array-like, shape = [n_samples, n_features] and y: array-like, shape = [n_samples]. 特别是， SGDRegressor ， GradientBoostingRegressor和AdaBoostRegressor目前都不支持这一点： fit(X, y)指定X：类似数组，shape = [n_samples，n_features]和y：array-like，shape = [n_samples]。

However, you can use other methods in scikit-learn. 但是，您可以在scikit-learn中使用其他方法。 For example, linear models: 例如，线性模型：

from sklearn import linear_model
# multivariate input
X = [[0., 0.], [1., 1.], [2., 2.], [3., 3.]]
# univariate output
Y = [0., 1., 2., 3.]
# multivariate output
Z = [[0., 1.], [1., 2.], [2., 3.], [3., 4.]]

# ordinary least squares
clf = linear_model.LinearRegression()
# univariate
clf.fit(X, Y)
clf.predict ([[1, 0.]])
# multivariate
clf.fit(X, Z)
clf.predict ([[1, 0.]])

# Ridge
clf = linear_model.BayesianRidge()
# univariate
clf.fit(X, Y)
clf.predict ([[1, 0.]])
# multivariate
clf.fit(X, Z)
clf.predict ([[1, 0.]])

# Lasso
clf = linear_model.Lasso()
# univariate
clf.fit(X, Y)
clf.predict ([[1, 0.]])
# multivariate
clf.fit(X, Z)
clf.predict ([[1, 0.]])

Answer 2

As already mentioned, only some models support multivariate output. 如前所述，只有一些模型支持多变量输出。 If you want to use one of the others, you can use a new class for parallelization of regressors for multivariate output: MultiOutputRegressor . 如果要使用其他一个，可以使用新类来并行化多变量输出的回归量： MultiOutputRegressor 。

You can use it like this: 你可以像这样使用它：

from sklearn.datasets import load_linnerud
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor

linnerud = load_linnerud()

X = linnerud.data
Y = linnerud.target

# to set number of jobs to the number of cores, use n_jobs=-1
MultiOutputRegressor(GradientBoostingRegressor(), n_jobs=-1).fit(X, Y)

回归与多维目标

问题描述

2 个解决方案

解决方案1
22 2014-02-05 02:44:31

解决方案2
15 2016-10-18 11:56:11

回归与多维目标

问题描述

2 个解决方案

解决方案1 22 2014-02-05 02:44:31

解决方案2 15 2016-10-18 11:56:11

解决方案1
22 2014-02-05 02:44:31

解决方案2
15 2016-10-18 11:56:11