简体   繁体   English

如何在scikit-learn中使用SGDRegressor

[英]How to use SGDRegressor in scikit-learn

I am trying to figure out how to properly use scikit-learn's SGDRegressor model. 我试图弄清楚如何正确使用scikit-learn的SGDRegressor模型。 in order to fit to a dataset I need to call a function fit(X,y) where x is a numpy array of shape (n_samples,n_features), and y is a 1d numpy array of length n_samples. 为了适应数据集,我需要调用一个function fit(X,y) ,其中x是一个numpy形状的数组(n_samples,n_features),y是一个长度为n_samples的1d numpy数组。 I am trying to figure out what y is supposed to represent. 我想弄清楚y应该代表什么。

for instance my data appears as so: 例如我的数据显示如下:

在此输入图像描述

my features are years starting in 1972, and the values are a corresponding value for that year. 我的特征是从1972年开始的年份,值是该年的相应值。 I am trying to predict the values for years in the future such as 2008, or 2012. I am assuming that each row in my data should represent a row/sample in X where each element in that is the value for a year. 我试图预测未来几年的值,例如2008年或2012年。我假设我的数据中的每一行都应代表X中的一行/样本,其中每个元素都是一年的值。 in that case what would y be? 在那种情况下会是什么? I was thinking that y should just be the years, but then y would be of length n_features instead of n_samples. 我认为y应该只是岁月,但是y的长度是n_features而不是n_samples。 if y is to be of length n_samples then what could y possibly be that is of length 5(number of samples in the data shown below). 如果y的长度为n_samples,则可能的长度为5(下面显示的数据中的样本数)。 I am thinking I must transform this data some way. 我想我必须以某种方式转换这些数据。

In machine learning, y represents the label or target of your data . 在机器学习中, y代表数据的标签或目标 That is, the correct answers for your training data ( X ). 也就是说,您的训练数据( X )的正确答案。

If you want to learn some values corresponding to years, then those years will be your training data ( X ) and the correct values associated to them will be your targets ( y ). 如果您想学习一些与年份相对应的值,那么这些年份将是您的训练数据( X ),与它们相关联的正确值将成为您的目标( y )。

You can notice that this fits the sizes you mentioned in your first paragraph: X will be of shape (n_samples, n_features) because it will have as many entries as you have years, and each entry will be of size 1 (you only have 1 feature, the year) and y will be of length n_samples because you have a value associated with each year. 您可以注意到这符合您在第一段中提到的大小: X将具有形状(n_samples, n_features)因为它将具有与您有多年的条目,并且每个条目的大小为1(您只有1功能,年份)和y的长度为n_samples因为您有一个与每年相关的值。

y is your target (what you want to predict) and you can get it this way: y是你的目标(你想要预测的),你可以这样得到它:

from sklearn import linear_model

clf = linear_model.SGDRegressor()
clf.fit(x_to_train, y_to_train)

# clf is a trained model

y_predicted = clf.predict(X_to_predict)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM