[英]Training different regressors with sklearn
I have a list of Xs and their output value Ys . 我有一个Xs及其输出值Ys的列表。 And using the following code, I am able to train the following regressors: 使用以下代码,我可以训练以下回归器:
The code: 编码:
import numpy as np
from sklearn.linear_model import LinearRegression, BayesianRidge
from sklearn.isotonic import IsotonicRegression
from sklearn import ensemble
from sklearn.svm import SVR
from sklearn.gaussian_process import GaussianProcess
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def get_meteor_scores(infile):
with io.open(infile, 'r') as fin:
meteor_scores = [float(i.strip().split()[-1]) for
i in re.findall(r'Segment [0-9].* score\:.*\n',
fin.read())]
return meteor_scores
def get_sts_scores(infile):
with io.open(infile, 'r') as fin:
sts_scores = [float(i) for i in fin]
return sts_scores
Xs = 'meteor.output.train'
Ys = 'score.train'
# Gets scores from https://raw.githubusercontent.com/alvations/USAAR-SemEval-2015/master/task02-USAAR-SHEFFIELD/x.meteor.train
meteor_scores = np.array(get_meteor_scores(Xs))
# Gets scores from https://raw.githubusercontent.com/alvations/USAAR-SemEval-2015/master/task02-USAAR-SHEFFIELD/score.train
sts_scores = np.array(get_sts_scores(Ys))
x = meteor_scores
y = sts_scores
n = len(sts_scores)
# Linear Regression
lr = LinearRegression()
lr.fit(x[:, np.newaxis], y)
# Baysian Ridge Regression
br = BayesianRidge(compute_score=True)
br.fit(x[:, np.newaxis], y)
# Isotonic Regression
ir = IsotonicRegression()
y_ = ir.fit_transform(x, y)
# Gradient Boosting Regression
params = {'n_estimators': 500, 'max_depth': 4, 'min_samples_split': 1,
'learning_rate': 0.01, 'loss': 'ls'}
gbr = ensemble.GradientBoostingRegressor(**params)
gbr.fit(x[:, np.newaxis], y)
But how do I train regressors for Support Vector Regression
, Gaussian Process
and Decision Tree Regressor
? 但是,如何训练回归器进行Support Vector Regression
, Gaussian Process
和Decision Tree Regressor
Support Vector Regression
呢?
When i tried the following to train Support Vector Regressors
, I get an error: 当我尝试以下方法来训练Support Vector Regressors
,出现错误:
from sklearn.svm import SVR
# Support Vector Regressions
svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
svr_lin = SVR(kernel='linear', C=1e3)
svr_poly = SVR(kernel='poly', C=1e3, degree=2)
y_rbf = svr_rbf.fit(x, y)
y_lin = svr_lin.fit(x, y)
y_poly = svr_poly.fit(x, y)
[out]: [OUT]:
Traceback (most recent call last):
File "/home/alvas/git/USAAR-SemEval-2015/task02-somethingLiddat/carolling.py", line 47, in <module>
y_rbf = svr_rbf.fit(x, y)
File "/home/alvas/.local/lib/python2.7/site-packages/sklearn/svm/base.py", line 149, in fit
(X.shape[0], y.shape[0]))
ValueError: X and y have incompatible shapes.
X has 1 samples, but y has 10597.
The same happens when I tried Gaussian Process
: 当我尝试Gaussian Process
时, Gaussian Process
发生相同的情况:
from sklearn.gaussian_process import GaussianProcess
# Gaussian Process
gp = GaussianProcess(corr='squared_exponential', theta0=1e-1,
thetaL=1e-3, thetaU=1,
random_start=100)
gp.fit(x, y)
[out]: [OUT]:
Traceback (most recent call last):
File "/home/alvas/git/USAAR-SemEval-2015/task02-somethingLiddat/carolling.py", line 57, in <module>
gp.fit(x, y)
File "/home/alvas/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gaussian_process.py", line 271, in fit
X, y = check_arrays(X, y)
File "/home/alvas/.local/lib/python2.7/site-packages/sklearn/utils/validation.py", line 254, in check_arrays
% (size, n_samples))
ValueError: Found array with dim 10597. Expected 1
When running the gp.fit(x[:,np.newaxis], y)
I get this error: 运行gp.fit(x[:,np.newaxis], y)
此错误:
Traceback (most recent call last):
File "/home/alvas/git/USAAR-SemEval-2015/task02-somethingLiddat/carolling.py", line 95, in <module>
gp.fit(x[:,np.newaxis], y)
File "/home/alvas/.local/lib/python2.7/site-packages/sklearn/gaussian_process/gaussian_process.py", line 301, in fit
raise Exception("Multiple input features cannot have the same"
Exception: Multiple input features cannot have the same target value.
When I tried Decision Tree Regressor
: 当我尝试Decision Tree Regressor
:
from sklearn.tree import DecisionTreeRegressor
# Decision Tree Regression
dtr2 = DecisionTreeRegressor(max_depth=2)
dtr5 = DecisionTreeRegressor(max_depth=2)
dtr2.fit(x,y)
dtr5.fit(x,y)
[out]: [OUT]:
Traceback (most recent call last):
File "/home/alvas/git/USAAR-SemEval-2015/task02-somethingLiddat/carolling.py", line 47, in <module>
dtr2.fit(x,y)
File "/home/alvas/.local/lib/python2.7/site-packages/sklearn/tree/tree.py", line 140, in fit
n_samples, self.n_features_ = X.shape
ValueError: need more than 1 value to unpack
All these regressors require multidimensional x-array but your x-array is a 1D array. 所有这些回归器都需要多维x数组,但是您的x数组是一维数组。 So only requirement is to convert x-array into 2D array for these regressors to work. 因此,仅要求将x数组转换为2D数组即可使这些回归器起作用。 This can be achieved using x[:, np.newaxis]
这可以使用x[:, np.newaxis]
来实现
Demo: 演示:
>>> from sklearn.svm import SVR
>>> # Support Vector Regressions
... svr_rbf = SVR(kernel='rbf', C=1e3, gamma=0.1)
>>> svr_lin = SVR(kernel='linear', C=1e3)
>>> svr_poly = SVR(kernel='poly', C=1e3, degree=2)
>>> x=np.arange(10)
>>> y=np.arange(10)
>>> y_rbf = svr_rbf.fit(x[:,np.newaxis], y)
>>> y_lin = svr_lin.fit(x[:,np.newaxis], y)
>>> svr_poly = svr_poly.fit(x[:,np.newaxis], y)
>>> from sklearn.gaussian_process import GaussianProcess
>>> # Gaussian Process
... gp = GaussianProcess(corr='squared_exponential', theta0=1e-1,
... thetaL=1e-3, thetaU=1,
... random_start=100)
>>> gp.fit(x[:, np.newaxis], y)
GaussianProcess(beta0=None,
corr=<function squared_exponential at 0x7f46f3ebcf50>,
normalize=True, nugget=array(2.220446049250313e-15),
optimizer='fmin_cobyla', random_start=100,
random_state=<mtrand.RandomState object at 0x7f4702d97150>,
regr=<function constant at 0x7f46f3ebc8c0>, storage_mode='full',
theta0=array([[ 0.1]]), thetaL=array([[ 0.001]]),
thetaU=array([[1]]), verbose=False)
>>> from sklearn.tree import DecisionTreeRegressor
>>> # Decision Tree Regression
... dtr2 = DecisionTreeRegressor(max_depth=2)
>>> dtr5 = DecisionTreeRegressor(max_depth=2)
>>> dtr2.fit(x[:,np.newaxis],y)
DecisionTreeRegressor(compute_importances=None, criterion='mse', max_depth=2,
max_features=None, min_density=None, min_samples_leaf=1,
min_samples_split=2, random_state=None, splitter='best')
>>> dtr5.fit(x[:,np.newaxis],y)
DecisionTreeRegressor(compute_importances=None, criterion='mse', max_depth=2,
max_features=None, min_density=None, min_samples_leaf=1,
min_samples_split=2, random_state=None, splitter='best')
Preprocessing for GaussianProcess
: GaussianProcess
预处理:
xu = np.unique(x) # get unique x values
idx = [np.where(x==x1)[0][0] for x1 in xu] # get corresponding indices for unique x values
gp.fit(xu[:,np.newaxis], y[idx]) # y[idx] selects y values corresponding to unique x values
Multiple input features cannot have the same target value.
This means that one data point is repeating in your input data, and the Gaussian process does not allow for one data point to be listed twice. 这意味着输入数据中重复了一个数据点,并且高斯过程不允许将一个数据点列出两次。 Unfortunately, your dataset is no longer available, so I cannot check this, but that is what I think should be the case. 不幸的是,您的数据集不再可用,因此我无法进行检查,但我认为应该是这种情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.