[英]standardize function in glmnet in R
set.seed(123)
n = 100
p = 20
x = matrix(rnorm(n * p, mean = 2, sd = 2), n, p)
y = rnorm(n)
lambda = 0.05
fit1 = glmnet(x,y, lambda = lambda)
beta1 = as.vector(coef(fit1, s = lambda, exact = TRUE))
beta1[which(abs(beta1) > 0)]
xsd = apply(x, 2, function(x) (x - mean(x))/sqrt(var(x) * (n - 1) / n))
fit2 = glmnet(xsd,y,lambda = lambda, standardize = FALSE)
beta2 = as.vector(coef(fit2, s = lambda, exact = TRUE))
beta2[which(abs(beta2) > 0)]
est.table = data.frame("beta1" = beta1[which(abs(beta1) > 0)], "beta2" = beta2[which(abs(beta2) > 0)])
我想假設 glmnet 解決的兩個套索問題的輸出應該相同。 一個帶有原始數據(standardize = TRUE),另一個帶有標准化數據(standardize = FALSE)。 但是為什么輸出完全不同。
當您有standardize = TRUE
時,系數將以原始比例返回。 這意味着您可以將其與輸入矩陣一起使用來獲得預測。
如果您查看非標准化 glmnet 的輸入,則輸入除以標准差,這意味着您的系數將按標准差放大。
要將它們與標准化輸入的回歸進行比較,您需要將非標准化 glmnet 的系數除以每列的標准差:
set.seed(123)
n = 100
p = 20
x = matrix(rnorm(n * p, mean = 2, sd = 2), n, p)
y = rnorm(n)
lambda = c(0.01,0.05,0.1,0.5,1)
fit1 = glmnet(x,y, lambda = lambda,standardize = TRUE)
beta1 = as.matrix(fit1$beta)
xsd = apply(x, 2, function(x) (x - mean(x))/sqrt(var(x) * (n - 1) / n))
fit2 = glmnet(xsd,y,lambda = lambda, standardize = FALSE)
beta2 = as.matrix(fit2$beta)
現在我們得到每一列的 sd:
colsd = apply(x, 2, function(x)sqrt(var(x) * (n - 1) / n))
我們將系數從標准化除以這個 sd:
head(sweep(beta2,1,colsd,"/"))
s0 s1 s2 s3 s4
V1 0 0 0.00000000 -0.014049634 -0.032142780
V2 0 0 0.00000000 -0.001181405 -0.026486241
V3 0 0 0.01605406 0.051932402 0.082018905
V4 0 0 0.00000000 0.000000000 0.000000000
V5 0 0 0.00000000 0.000000000 0.004122524
V6 0 0 0.00000000 0.000000000 0.000000000
並與其他回歸進行比較:
head(beta1)
s0 s1 s2 s3 s4
V1 0 0 0.00000000 -0.014049634 -0.032142780
V2 0 0 0.00000000 -0.001181405 -0.026486241
V3 0 0 0.01605406 0.051932402 0.082018905
V4 0 0 0.00000000 0.000000000 0.000000000
V5 0 0 0.00000000 0.000000000 0.004122524
V6 0 0 0.00000000 0.000000000 0.000000000
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.