简体繁体 English

我应该在惩罚线性回归 model 中对因变量进行归一化吗？

[英]Should I normalize the dependent variable in a penalized linear regression model?

原文 2020-12-10 14:06:41 6 1 r/ regression/ lasso-regression

When I compute penalized regression on the data without normalizing using the glmnet package in R, the lambda values and RMSE generated in lasso, ridge, and elastic net are unreasonably large.当我在没有使用 R 中的 glmnet package 进行归一化的情况下计算数据的惩罚回归时，lambda 值和 RMSE 是不合理地生成的。 The RMSE generated is in thousands.生成的 RMSE 以千计。 However, when I normalize the response variable, I see that the lambda, RMSE, and R^2 values are all within reasonable range, < 1. Are we supposed to normalize the response variable?但是，当我对响应变量进行归一化时，我看到 lambda、RMSE 和 R^2 值都在合理范围内，< 1。我们应该对响应变量进行归一化吗？ I tried scaling the predictor variables, but it still generates large values for lambda, RMSE, and R^2.我尝试缩放预测变量，但它仍然会为 lambda、RMSE 和 R^2 生成较大的值。 The response variable is numeric, number of shares of online articles and the values range from 1-840,000 with a mean of 3500.响应变量是数字，在线文章的分享数，值范围为 1-840,000，平均值为 3500。

1 个解决方案

RMSE is a measure of measure of the prediction error in the model , which is on the same scale as your dependent variable, and can only be interpreted in relation to that scale. RMSE 是model 中预测误差的度量，它与您的因变量在同一尺度上，并且只能根据该尺度进行解释。 You can get a lower RMSE by rescaling your dependent variable, but in the case of a linear scaling it would only be because you change the scale of the dependent variable.您可以通过重新缩放因变量来获得较低的 RMSE，但在线性缩放的情况下，这只是因为您更改了因变量的比例。 An RMSE in the thousands does not seem unreasonable if your values are in the range 1-840,000 with a mean of 3500.如果您的值在 1-840,000 范围内，平均值为 3500，则以千计的 RMSE 似乎并不合理。

线性回归 model 与 R 中的虚拟（因）变量和分类（独立）变量 - Linear Regression model with dummy (dependent) variable and categorical (independent) variable in R

以字符为因变量的多元线性回归 - Multiple Linear Regression with character as dependent variable

如何在R中建立具有一个自变量和三个因变量的线性回归模型？ - How to build a linear regression model with one independent variable and three dependent variables in R?

在惩罚逻辑回归 model 中提取特征重要性 - extract feature importance in penalized logistc regression model

计算自变量在解释线性回归中因变量的方差中的重要性 - Calculating importance of independent variable in explaining variance of dependent variable in linear regression

等式：回归 Model 对因变量戴帽子且没有 epsilon - equatiomatic: Regression Model with hat on dependent variable and no epsilon

每个独立变量的线性回归循环单独与依赖 - Linear Regression loop for each independent variable individually against dependent

使用R中的dlply（）对每列具有因变量的子集进行线性回归 - Linear regression on subsets with dependent variable per column using dlply() in R

可以/我应该使用对数线性模型的输出作为逻辑回归模型中的预测变量吗？ - Can/Should I use the output of a log-linear model as the predictors in a logistic regression model?

自动变量选择–回归线性模型 - Automatic variable selection – Regression linear model

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 线性回归 model 与 R 中的虚拟（因）变量和分类（独立）变量 - Linear Regression model with dummy (dependent) variable and categorical (independent) variable in R 以字符为因变量的多元线性回归 - Multiple Linear Regression with character as dependent variable 如何在R中建立具有一个自变量和三个因变量的线性回归模型？ - How to build a linear regression model with one independent variable and three dependent variables in R? 在惩罚逻辑回归 model 中提取特征重要性 - extract feature importance in penalized logistc regression model 计算自变量在解释线性回归中因变量的方差中的重要性 - Calculating importance of independent variable in explaining variance of dependent variable in linear regression 等式：回归 Model 对因变量戴帽子且没有 epsilon - equatiomatic: Regression Model with hat on dependent variable and no epsilon 每个独立变量的线性回归循环单独与依赖 - Linear Regression loop for each independent variable individually against dependent 使用R中的dlply（）对每列具有因变量的子集进行线性回归 - Linear regression on subsets with dependent variable per column using dlply() in R 可以/我应该使用对数线性模型的输出作为逻辑回归模型中的预测变量吗？ - Can/Should I use the output of a log-linear model as the predictors in a logistic regression model? 自动变量选择–回归线性模型 - Automatic variable selection – Regression linear model

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM