如何在scikit-learn中使用SGDRegressor

Question

I am trying to figure out how to properly use scikit-learn's SGDRegressor model. 我试图弄清楚如何正确使用scikit-learn的SGDRegressor模型。 in order to fit to a dataset I need to call a function fit(X,y) where x is a numpy array of shape (n_samples,n_features), and y is a 1d numpy array of length n_samples. 为了适应数据集，我需要调用一个function fit(X,y) ，其中x是一个numpy形状的数组（n_samples，n_features），y是一个长度为n_samples的1d numpy数组。 I am trying to figure out what y is supposed to represent. 我想弄清楚y应该代表什么。

for instance my data appears as so: 例如我的数据显示如下：

在此输入图像描述

my features are years starting in 1972, and the values are a corresponding value for that year. 我的特征是从1972年开始的年份，值是该年的相应值。 I am trying to predict the values for years in the future such as 2008, or 2012. I am assuming that each row in my data should represent a row/sample in X where each element in that is the value for a year. 我试图预测未来几年的值，例如2008年或2012年。我假设我的数据中的每一行都应代表X中的一行/样本，其中每个元素都是一年的值。 in that case what would y be? 在那种情况下会是什么？ I was thinking that y should just be the years, but then y would be of length n_features instead of n_samples. 我认为y应该只是岁月，但是y的长度是n_features而不是n_samples。 if y is to be of length n_samples then what could y possibly be that is of length 5(number of samples in the data shown below). 如果y的长度为n_samples，则可能的长度为5（下面显示的数据中的样本数）。 I am thinking I must transform this data some way. 我想我必须以某种方式转换这些数据。

Answer 1

In machine learning, y represents the label or target of your data . 在机器学习中， y代表数据的标签或目标 。 That is, the correct answers for your training data ( X ). 也就是说，您的训练数据（ X ）的正确答案。

If you want to learn some values corresponding to years, then those years will be your training data ( X ) and the correct values associated to them will be your targets ( y ). 如果您想学习一些与年份相对应的值，那么这些年份将是您的训练数据（ X ），与它们相关联的正确值将成为您的目标（ y ）。

You can notice that this fits the sizes you mentioned in your first paragraph: X will be of shape (n_samples, n_features) because it will have as many entries as you have years, and each entry will be of size 1 (you only have 1 feature, the year) and y will be of length n_samples because you have a value associated with each year. 您可以注意到这符合您在第一段中提到的大小： X将具有形状(n_samples, n_features)因为它将具有与您有多年的条目，并且每个条目的大小为1（您只有1功能，年份）和y的长度为n_samples因为您有一个与每年相关的值。

Answer 2

y is your target (what you want to predict) and you can get it this way: y是你的目标（你想要预测的），你可以这样得到它：

from sklearn import linear_model

clf = linear_model.SGDRegressor()
clf.fit(x_to_train, y_to_train)

# clf is a trained model

y_predicted = clf.predict(X_to_predict)

如何在scikit-learn中使用SGDRegressor

问题描述

2 个解决方案

解决方案1
2 2015-05-22 07:18:11

解决方案2
2 2015-05-22 13:07:43

如何在scikit-learn中使用SGDRegressor

问题描述

2 个解决方案

解决方案1 2 2015-05-22 07:18:11

解决方案2 2 2015-05-22 13:07:43

解决方案1
2 2015-05-22 07:18:11

解决方案2
2 2015-05-22 13:07:43