简体   繁体   English

SAS回归模型RMSE - 得分与否

[英]SAS Regression model RMSE - to score or not to score

Intro: Taking the model from one data set and applying to another data set to find an RMSE. 简介:从一个数据集中获取模型并应用于另一个数据集以查找RMSE。

Say, I have dataset "data100" 说,我有数据集“data100”

And run the following selection operation to determine significant variables: 并运行以下选择操作以确定重要变量:

PROC REG DATA =data100;
model y= x0-x999 / selection=forward SLENTRY=.01;
run;quit;

It returns that x0 x10 x20 x30 x40 x50 x60 x70 x80 x90 are significant at <.0001. 它返回x0 x10 x20 x30 x40 x50 x60 x70 x80 x90在<.0001处显着。 Ok. 好。 Now, I want to use this model in another data set "data1000". 现在,我想在另一个数据集“data1000”中使用此模型。

Why couldn't I then just use: 为什么我不能只使用:

PROC REG DATA =data1000;
model y= x0 x10 x20 x30 x40 x50 x60 x70 x80 x90;
run;quit;

To determine the RMSE of the data1000 set? 要确定data1000设置的RMSE?


The reason this came up is that a mentor told me to use: 这出现的原因是导师告诉我使用:

proc reg=data100 outest=data100est;
model y= x0-x999;
run;quit;

proc score data=data1000 score=data100est out=data1000p residual type=parms;
var y x0-x999;
run;

proc univariate data=data1000P;
var model1;
output out=data1000stat uss=ss1;
run;

data data1000stat;
set data1000stat;
rmse=sqrt(ss1/1000);
run;

proc print data=data1000stat;
run;quit;

I'm very confused about this point and if anyone can clarify the why or even if proc score is appropriate here, that would be great. 我这点很迷茫,如果任何人都可以澄清,为什么或者即使 PROC比分是适当这里,那将是巨大的。

This is probably better asked on the Stats forum. 这可能是在统计论坛上更好的问题。 But since you asked... 但既然你问过......

When you run the second REG statement, you are refitting the model. 运行第二个REG语句时,您将重新安装模型。 The estimated betas will be different from the betas you got in the first REG statement. 估计的beta将与您在第一个REG语句中获得的beta不同。 You are rerunning the regression and by definition getting the MINIMUM RMSE for those data. 您正在重新运行回归,并按定义获取这些数据的MINIMUM RMSE。

The second method keeps the betas from the first regression and applies them to the second. 第二种方法保留来自第一次回归的beta并将它们应用于第二种。 The RMSE you calculate here will show you how well your 100 data modeled the 1000 data. 您在此处计算的RMSE将显示您的100个数据对1000个数据建模的程度。

In the end, both are informative. 最后,两者都是翔实的。 The difference between the two RMSE show you how well the 100 predict the 1000. 两个RMSE之间的差异向您显示100预测1000的好坏。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM