[英]How AICc deals with nominal vs. numerical variables in linear models?
今天,我有一個關於線性模型中數字和標稱變量處理的問題,我的目標是 與小n 的二階Akaike信息准則 ( AICc
,軟件包:'MuMIn')進行比較。
這是一些虛構的數據和准備代碼:
library(MASS)
library(MuMIn)
set.seed(123)
treatments <- c(rep(paste0('t', 1:6), each = 3)) # nominal variable
x <- abs(rnorm(mean = 9500,n = 18,sd = 20000)) # observation
var3 <- runif(n=18, min = 100, max=1000)
var2 <- rnorm(n = 18, mean = 50)
var1 <- c(runif(n=3, min = 80, max=100), # numerical dummy variable for t1
runif(n=3, min = 65, max=85), # t2
runif(n=3, min = 75, max=90), # t3
runif(n=3, min = 15, max=50), # t4
runif(n=3, min = 0, max=20), # t5
runif(n=3, min = 30, max=60)) #t6
boxplot(var1~treatments) # well-separated for each treatment: use as dummy
dat <- data.frame(x, var1, var2, var3, treatments)
說明:我們有一個觀察,我們想知道治療1-6的效果。 數據包含不同處理的標稱變量,並且偶然地我們有一個數值變量,可用作單個處理的虛擬/代理。
這是線性建模:
lm.nominal.1 <- lm(formula = x~treatments, data = dat)
qqnorm(rstudent(lm.nominal.1)); qqline(rstudent(lm.nominal.1)) # does not look too well
plot(rstudent(lm.nominal.1)~fitted(lm.nominal.1)) ; abline(h=0, col='red') # same here
# so let's log-transform:
lm.nominal.1.log <- lm(formula = log(x)~treatments, data = dat)
qqnorm(rstudent(lm.nominal.1.log)); qqline(rstudent(lm.nominal.1.log)) # much better
plot(rstudent(lm.nominal.1.log)~fitted(lm.nominal.1.log)) ; abline(h=0, col='red') # same here
# ... in accordance to above
lm.nominal.2.log <- lm(formula = log(x)~treatments+var2, data = dat)
lm.nominal.3.log <- lm(formula = log(x)~treatments+var2+var3, data = dat)
lm.numeric.1.log <- lm(formula = log(x)~var1, data = dat)
lm.numeric.2.log <- lm(formula = log(x)~var1+var2, data = dat)
lm.numeric.3.log <- lm(formula = log(x)~var1+var2+var3, data = dat)
這是赤池准則:
AICc.nominals <- AICc(lm.nominal.1.log, lm.nominal.2.log, lm.nominal.3.log)
AICc.nominals
AICc.numerics <- AICc(lm.numeric.1.log, lm.numeric.2.log, lm.numeric.3.log)
AICc.numerics
AICc.all <- AICc(lm.nominal.1.log, lm.nominal.2.log, lm.nominal.3.log,
lm.numeric.1.log, lm.numeric.2.log, lm.numeric.3.log)
# Now further model / likelihood analysis:
AICc.all$Deltai <- AICc.all$AICc - min(AICc.all$AICc)
AICc.all$Weights <- Weights(AICc(lm.nominal.1.log, lm.nominal.2.log,
lm.nominal.3.log,lm.numeric.1.log, lm.numeric.2.log, lm.numeric.3.log))
可以將包含數字虛擬變量的線性模型與包含名義變量的線性模型進行比較嗎? 還是像比較蘋果和橘子?
lm
在內部進行偽編碼。 如果您手動執行此操作,則會得到完全相同的結果:
fit1 <- lm(Sepal.Length ~ Species, iris)
fit2 <- lm(Sepal.Length ~ model.matrix(fit1), iris)
AIC(fit1, fit2)
# df AIC
#fit1 4 231.452
#fit2 4 231.452
所以,是的,沒關系。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.