简体   繁体   English

使用分类预测变量进行回归的标准化系数:出了点问题

[英]Standardized coefs in regression with a categorical predictor: there's something wrong

From what I understood, standardized coefficients can be used as indices of effect size (with the possibility of using rules of thumb such as Cohen's 1988). 据我了解,标准化系数可以用作效果大小的指标(可以使用经验法则,例如Cohen's 1988)。 I also understood that standardized coefs are expressed in terms of standard deviation , which makes them relatively close to a Cohen's d. 我还了解到,标准化系数以标准偏差表示 ,这使它们相对接近Cohen的d。

I also understood that one way of obtaining standardized coefs is to standardize the data beforehand. 我还了解,获取标准化系数的一种方法是预先对数据进行标准化。 Another is to use the std.coef function from the MuMIn package. 另一个方法是使用MuMIn包中的std.coef函数。

These two methods are equivalent when using a linear predictor: 使用线性预测变量时,这两种方法是等效的:

library(tidyverse)
library(MuMIn) # For stds coefs


df <- iris %>% 
  select(Sepal.Length, Sepal.Width) %>% 
  scale() %>% 
  as.data.frame() %>% 
  mutate(Species = iris$Species)


fit <- lm(Sepal.Length ~ Sepal.Width, data=df)
round(coef(fit), 2)
round(MuMIn::std.coef(fit, partial.sd = TRUE), 2)

In both cases, the coefficient is -0.12. 在这两种情况下,系数均为-0.12。 I interpret it as follows: for each increase of 1 standard deviation of Sepal.Width, Sepal.Length diminishes of 0.12 of its SD . 我将其解释为: 对于Sepal.Width的每1标准偏差增加,Sepal.Length减小其SD的0.12

And yet, these two methods give different results with a categorical predictor: 但是,这两种方法在分类预测器中得出的结果不同

fit <- lm(Sepal.Length ~ Species, data=df)
round(coef(fit), 2)
round(MuMIn::std.coef(fit, partial.sd = TRUE), 2)

Which gives, for the effect of versicolor as compared to setosa (the intercept), 1.12 and 0.46. 相比于setosa(截距),1.12和0.46,其给出对于云芝的效果。

Which should I believe to be able to say "the difference between versicolor and setosa is ... of Sepal.Length's SD"? 我应该说什么能说“ 杂色setosa之间的区别是Sepal.Length的SD的……”? Thanks a lot 非常感谢

You didn't standardize the implicit variables associated with Species , so those coefficients would not be standardized. 您没有标准化与Species相关的隐式变量,因此这些系数不会标准化。

You could do so as follows: 您可以这样做,如下所示:

dummies <- scale(contrasts(df$Species)[df$Species,])
fit <- lm(Sepal.Length ~ dummies, data = df)
round(coef(fit), 2)
#      (Intercept) dummiesversicolor  dummiesvirginica 
#             0.00              0.53              0.90 

This agrees with the results of MuMIn::std.coef if you set the partial.sd argument to FALSE . 如果将partial.sd参数设置为FALSE则这与MuMIn::std.coef的结果一致。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM