简体   繁体   English

使用回归数据模型预测价格

[英]Predicting price using regression data model

I built regression data model to predict house price upon several independent variables.我建立了回归数据模型来预测几个自变量的房价。 And I got regression equation with coefficient.我得到了带系数的回归方程。 I used StandardScaler()to scale my variables before split the data set.在拆分数据集之前,我使用 StandardScaler() 来缩放我的变量。 And now I want to predict house price when given new values for independent variables using my regression model for that thing can I directly use values for independent variables and calculate price?现在我想在给定自变量的新值时使用我的回归模型预测房价,我可以直接使用自变量的值并计算价格吗? or before include values for independent variables should I pass the values through StandardScaler() method??或者在包含自变量的值之前,我应该通过 StandardScaler() 方法传递值吗??

Yes, you need to preprocess the new values.是的,您需要预处理新值。 If you have scaled your training data and fitted a model to that scaled data, then any new data fed into the model should undergo equivalent preprocessing as well.如果您对训练数据进行了缩放并将模型拟合到该缩放后的数据中,那么输入到模型中的任何新数据也应经过等效的预处理。 This is standard practice, as it ensures that the model is always provided a data set of consistent form as input.这是标准做法,因为它确保始终为模型提供一致形式的数据集作为输入。 The caveat is that you should use transform instead of fit_transform .需要注意的是,您应该使用transform而不是fit_transform

The process might look as follows:该过程可能如下所示:

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
new_data = scaler.transform(new_data)

There is a detailed write up on this topic on another thread that might be of interest to you.在您可能感兴趣的另一个线程上有关于此主题的详细文章。

To answer your question, yes you have to process your test input as well but consider the following explanation.要回答您的问题,是的,您还必须处理您的测试输入,但请考虑以下解释。

StandardScaler() standardize features by removing the mean and scaling to unit variance StandardScaler() 通过去除均值和缩放到单位方差来标准化特征

If you fit the scaler on whole dataset and then split, Scaler would consider all values while computing mean and Variance.如果在整个数据集上拟合缩放器然后拆分,缩放器将在计算均值和方差时考虑所有值。

The test set should ideally not be preprocessed with the training data.理想情况下,不应使用训练数据对测试集进行预处理。 This will ensure no 'peeking ahead'.这将确保不会“偷看”。 Train data should be preprocessed separately and once the model is created we can apply the same preprocessing parameters used for the train set, onto the test set as though the test set didn't exist before.训练数据应该单独预处理,一旦创建模型,我们就可以将用于训练集的相同预处理参数应用到测试集上,就好像测试集以前不存在一样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM