[英]Stan Polynomial Regression Parameter Estimation Model Review
我有以下多項式回歸模型:
圖像版本
乳膠版
$Y_i | \\mu_i, \\sigma^2 \\sim \\text{Normal}(\\mu_i, \\sigma^2), i = 1, \\dots, n \\ \\text{independent}$
$\\mu_i = \\alpha + \\beta_1 x_{i1} + \\beta_2 x_{i2} + \\beta_3 x_{i1}^2 + \\beta_4 x_{i2}^2 + \\beta_5 x_{i1} x_{i2}$
$\\alpha \\sim \\text{一些合適的先驗}$
$\\beta_1, \\dots, \\beta_5 \\sim \\text{一些合適的先驗}$
$\\sigma^2 \\sim \\text{一些合適的先驗}$
我想將樣本大小和 $y_i$、$x_{i1}$ 和 $x_{i2}$ 上的觀察向量作為輸入。 代碼如下:
data{
int<lower=1> n;
vector[n] x1;
vector[n] x2;
vector[n] y;
}
我想標准化(中心和比例)兩個輸入變量以獲得標准化的回歸變量x1_std
和x2_std
。 這個的代碼在transformed data
塊中,如下:
transformed data{
real bar_x1;
real x1_sd;
vector[n] x1_std;
real bar_x2;
real x2_sd;
vector[n] x2_std;
real y_sd;
bar_x1 = mean(x1);
x1_sd = sd(x1);
x1_std = (x1 - bar_x1)/x1_sd; // centered and scaled
bar_x2 = mean(x2);
x2_sd = sd(x2);
x2_std = (x2 - bar_x2)/x2_sd; // centered and scaled
y_sd = sd(y);
}
然后我想用標准化回歸變量回歸估計回歸參數$ \\ $阿爾法,$ \\ $ beta_1和$ \\點,\\ $ beta_5以適應上述多項式回歸模型,對原有和標准化標度。
基於此,如果我沒記錯的話,從標准化參數到原始尺度的變換公式如下:
圖像版本
乳膠版
$\\alpha = \\tilde{\\alpha} - \\dfrac{\\gamma_1}{s_1}\\bar{x}_1 - \\dfrac{\\gamma_2}{s_2}\\bar{x}_2 + \\dfrac{\\gamma_3}{ s_1^2}\\bar{x}_1^2 + \\dfrac{\\gamma_4}{s_2^2}\\bar{x}_2^2 + \\dfrac{\\gamma_5}{s_1 s_2}\\bar{x}_1\\酒吧{x}_2$
$\\beta_1 = \\left( \\dfrac{\\gamma_1}{s_1} - 2\\dfrac{\\gamma_3}{s_1^2}\\bar{x}_1 - \\dfrac{\\gamma_5}{s_1 s_2}\\bar{x }_2 \\right)$
$\\beta_2 = \\left( \\dfrac{\\gamma_2}{s_2} - 2\\dfrac{\\gamma_4}{s_2^2}\\bar{x}_2 - \\dfrac{\\gamma_5}{s_1 s_2}\\bar{x }_1 \\right)$
$\\beta_3 = \\dfrac{\\gamma_3}{s_1^2}$
$\\beta_4 = \\dfrac{\\gamma_4}{s_2^2}$
$\\beta_5 = \\dfrac{\\gamma_5}{s_1 s_2}$
實現這一點的代碼包含在generated quantities
塊中,如下所示:
alpha = alpha_std - beta1_std*bar_x1/x1_sd - beta2_std*bar_x2/x2_sd
+ (beta3_std*bar_x1^2)/x1_sd^2 + (beta4_std*bar_x2^2)/x2_sd^2
+ (beta5_std*bar_x2*bar_x1)/(x1_sd*x2_sd);
beta1 = beta1_std/x1_sd - 2*beta3_std*bar_x1/x1_sd^2
- beta5_std*bar_x2/(x1_sd*x2_sd);
beta2 = beta2_std/x2_sd - 2*beta4_std*bar_x2/x2_sd^2
- beta5_std*bar_x1/(x1_sd*x2_sd);
beta3 = beta3_std/x1_sd^2;
beta4 = beta4_std/x2_sd^2;
beta5 = beta5_std/(x1_sd*x2_sd);
我的整個模型如下:
data{
int<lower=1> n;
vector[n] x1;
vector[n] x2;
vector[n] y;
}
transformed data{
real bar_x1;
real x1_sd;
vector[n] x1_std;
real bar_x2;
real x2_sd;
vector[n] x2_std;
real y_sd;
bar_x1 = mean(x1);
x1_sd = sd(x1);
x1_std = (x1 - bar_x1)/x1_sd; // centered and scaled
bar_x2 = mean(x2);
x2_sd = sd(x2);
x2_std = (x2 - bar_x2)/x2_sd; // centered and scaled
y_sd = sd(y);
}
parameters{
real<lower=0> sigma;
real alpha_std;
real beta1_std;
real beta2_std;
real beta3_std;
real beta4_std;
real beta5_std;
}
transformed parameters {
real mu[n];
for(i in 1:n) {
mu[i] = alpha_std + beta1_std*x1_std[i]
+ beta2_std*x2_std[i] + beta3_std*x1_std[i]^2
+ beta4_std*x2_std[i]^2 + beta5_std*x1_std[i]*x2_std[i];
}
}
model{
alpha_std ~ normal(0, 10);
beta1_std ~ normal(0, 2.5);
beta2_std ~ normal(0, 2.5);
beta3_std ~ normal(0, 2.5);
beta4_std ~ normal(0, 2.5);
beta5_std ~ normal(0, 2.5);
sigma ~ exponential(1 / y_sd);
y ~ normal(mu, sigma);
}
generated quantities {
real alpha;
real beta1;
real beta2;
real beta3;
real beta4;
real beta5;
alpha = alpha_std - beta1_std*bar_x1/x1_sd - beta2_std*bar_x2/x2_sd
+ (beta3_std*bar_x1^2)/x1_sd^2 + (beta4_std*bar_x2^2)/x2_sd^2
+ (beta5_std*bar_x2*bar_x1)/(x1_sd*x2_sd);
beta1 = beta1_std/x1_sd - 2*beta3_std*bar_x1/x1_sd^2
- beta5_std*bar_x2/(x1_sd*x2_sd);
beta2 = beta2_std/x2_sd - 2*beta4_std*bar_x2/x2_sd^2
- beta5_std*bar_x1/(x1_sd*x2_sd);
beta3 = beta3_std/x1_sd^2;
beta4 = beta4_std/x2_sd^2;
beta5 = beta5_std/(x1_sd*x2_sd);
}
我正在使用 R 的MASS
包中的hills
數據集:
library(MASS)
hills[18, 3] <- 18.65 # Fixing transcription error
x1 <- hills$dist
x2 <- hills$climb
y <- hills$time
n <- length(x1)
data.in <- list(x1 = x1, x2 = x2, y = y, n = n)
model.fit <- sampling(example, data.in)
現在我輸出的標准化( alpha_std
, beta1_std
, beta2_std
, beta3_std
, beta4_std
, beta5_std
)和原始規模( alpha
, beta1
, beta2
beta3
beta4
, beta5
)回歸參數:
print(model.fit, pars = c("alpha_std", "alpha", "beta1_std", "beta2_std", "beta3_std", "beta4_std", "beta5_std", "beta1", "beta2", "beta3", "beta4", "beta5", "sigma"), probs = c(0.05, 0.5, 0.95), digits = 5)
我是否正確地解決了這個問題? 我還對數學進行了兩次和三次檢查,所以我認為它應該是正確的。 盡管如此,我擔心的一件事是beta4
是 0.00000。 這是否表明我犯了錯誤? 正如我所說,我已經檢查了我所有的代碼和數學,所以,據我所知,一切似乎都很好。
好的,我剛剛發現問題是我沒有用足夠的數字(5 不夠)打印值來查看該值不是 0.00000。 其他一切都很好。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.