簡體   English   中英

斯坦多項式回歸參數估計模型回顧

[英]Stan Polynomial Regression Parameter Estimation Model Review

我有以下多項式回歸模型:

圖像版本

在此處輸入圖片說明

乳膠版

$Y_i | \\mu_i, \\sigma^2 \\sim \\text{Normal}(\\mu_i, \\sigma^2), i = 1, \\dots, n \\ \\text{independent}$

$\\mu_i = \\alpha + \\beta_1 x_{i1} + \\beta_2 x_{i2} + \\beta_3 x_{i1}^2 + \\beta_4 x_{i2}^2 + \\beta_5 x_{i1} x_{i2}$

$\\alpha \\sim \\text{一些合適的先驗}$

$\\beta_1, \\dots, \\beta_5 \\sim \\text{一些合適的先驗}$

$\\sigma^2 \\sim \\text{一些合適的先驗}$

我想將樣本大小和 $y_i$、$x_{i1}$ 和 $x_{i2}$ 上的觀察向量作為輸入。 代碼如下:

data{
  int<lower=1> n;
  vector[n] x1;
  vector[n] x2;
  vector[n] y;
}

我想標准化(中心和比例)兩個輸入變量以獲得標准化的回歸變量x1_stdx2_std 這個的代碼在transformed data塊中,如下:

transformed data{
  real bar_x1;
  real x1_sd;
  vector[n] x1_std;
  real bar_x2;
  real x2_sd;
  vector[n] x2_std;
  real y_sd;

  bar_x1 = mean(x1);
  x1_sd = sd(x1);
  x1_std = (x1 - bar_x1)/x1_sd; // centered and scaled

  bar_x2 = mean(x2);
  x2_sd = sd(x2);
  x2_std = (x2 - bar_x2)/x2_sd; // centered and scaled

  y_sd = sd(y);
}

然后我想用標准化回歸變量回歸估計回歸參數$ \\ $阿爾法,$ \\ $ beta_1和$ \\點,\\ $ beta_5以適應上述多項式回歸模型,對原有和標准化標度

在此處輸入圖片說明

在此處輸入圖片說明

基於此,如果我沒記錯的話,從標准化參數到原始尺度的變換公式如下:

圖像版本

在此處輸入圖片說明

乳膠版

$\\alpha = \\tilde{\\alpha} - \\dfrac{\\gamma_1}{s_1}\\bar{x}_1 - \\dfrac{\\gamma_2}{s_2}\\bar{x}_2 + \\dfrac{\\gamma_3}{ s_1^2}\\bar{x}_1^2 + \\dfrac{\\gamma_4}{s_2^2}\\bar{x}_2^2 + \\dfrac{\\gamma_5}{s_1 s_2}\\bar{x}_1\\酒吧{x}_2$

$\\beta_1 = \\left( \\dfrac{\\gamma_1}{s_1} - 2\\dfrac{\\gamma_3}{s_1^2}\\bar{x}_1 - \\dfrac{\\gamma_5}{s_1 s_2}\\bar{x }_2 \\right)$

$\\beta_2 = \\left( \\dfrac{\\gamma_2}{s_2} - 2\\dfrac{\\gamma_4}{s_2^2}\\bar{x}_2 - \\dfrac{\\gamma_5}{s_1 s_2}\\bar{x }_1 \\right)$

$\\beta_3 = \\dfrac{\\gamma_3}{s_1^2}$

$\\beta_4 = \\dfrac{\\gamma_4}{s_2^2}$

$\\beta_5 = \\dfrac{\\gamma_5}{s_1 s_2}$

實現這一點的代碼包含在generated quantities塊中,如下所示:

alpha = alpha_std - beta1_std*bar_x1/x1_sd - beta2_std*bar_x2/x2_sd
      + (beta3_std*bar_x1^2)/x1_sd^2 + (beta4_std*bar_x2^2)/x2_sd^2
      + (beta5_std*bar_x2*bar_x1)/(x1_sd*x2_sd);

beta1 = beta1_std/x1_sd - 2*beta3_std*bar_x1/x1_sd^2
      - beta5_std*bar_x2/(x1_sd*x2_sd);

beta2 = beta2_std/x2_sd - 2*beta4_std*bar_x2/x2_sd^2
      - beta5_std*bar_x1/(x1_sd*x2_sd);

beta3 = beta3_std/x1_sd^2;

beta4 = beta4_std/x2_sd^2;

beta5 = beta5_std/(x1_sd*x2_sd);

我的整個模型如下:

data{
  int<lower=1> n;
  vector[n] x1;
  vector[n] x2;
  vector[n] y;
}
transformed data{
  real bar_x1;
  real x1_sd;
  vector[n] x1_std;
  real bar_x2;
  real x2_sd;
  vector[n] x2_std;
  real y_sd;

  bar_x1 = mean(x1);
  x1_sd = sd(x1);
  x1_std = (x1 - bar_x1)/x1_sd; // centered and scaled

  bar_x2 = mean(x2);
  x2_sd = sd(x2);
  x2_std = (x2 - bar_x2)/x2_sd; // centered and scaled

  y_sd = sd(y);
}
parameters{
  real<lower=0> sigma;
  real alpha_std;
  real beta1_std;
  real beta2_std;
  real beta3_std;
  real beta4_std;
  real beta5_std;
}
transformed parameters {
  real mu[n];

  for(i in 1:n) {
    mu[i] = alpha_std + beta1_std*x1_std[i]
      + beta2_std*x2_std[i] + beta3_std*x1_std[i]^2
      + beta4_std*x2_std[i]^2 + beta5_std*x1_std[i]*x2_std[i];
  }
}
model{
  alpha_std ~ normal(0, 10);
  beta1_std ~ normal(0, 2.5);
  beta2_std ~ normal(0, 2.5);
  beta3_std ~ normal(0, 2.5);
  beta4_std ~ normal(0, 2.5);
  beta5_std ~ normal(0, 2.5);
  sigma ~ exponential(1 / y_sd);

  y ~ normal(mu, sigma);
}
generated quantities {
  real alpha;
  real beta1;
  real beta2;
  real beta3;
  real beta4;
  real beta5;
  
  alpha = alpha_std - beta1_std*bar_x1/x1_sd - beta2_std*bar_x2/x2_sd
      + (beta3_std*bar_x1^2)/x1_sd^2 + (beta4_std*bar_x2^2)/x2_sd^2
      + (beta5_std*bar_x2*bar_x1)/(x1_sd*x2_sd);

  beta1 = beta1_std/x1_sd - 2*beta3_std*bar_x1/x1_sd^2
      - beta5_std*bar_x2/(x1_sd*x2_sd);

  beta2 = beta2_std/x2_sd - 2*beta4_std*bar_x2/x2_sd^2
      - beta5_std*bar_x1/(x1_sd*x2_sd);

  beta3 = beta3_std/x1_sd^2;

  beta4 = beta4_std/x2_sd^2;

  beta5 = beta5_std/(x1_sd*x2_sd);
}

我正在使用 R 的MASS包中的hills數據集:

library(MASS)
hills[18, 3] <- 18.65 # Fixing transcription error
x1 <- hills$dist
x2 <- hills$climb
y <- hills$time
n <- length(x1)
data.in <- list(x1 = x1, x2 = x2, y = y, n = n)
model.fit <- sampling(example, data.in)

現在我輸出的標准化( alpha_stdbeta1_stdbeta2_stdbeta3_stdbeta4_stdbeta5_std )和原始規模( alphabeta1beta2 beta3 beta4beta5 )回歸參數:

print(model.fit, pars = c("alpha_std", "alpha", "beta1_std", "beta2_std", "beta3_std", "beta4_std", "beta5_std", "beta1", "beta2", "beta3", "beta4", "beta5", "sigma"), probs = c(0.05, 0.5, 0.95), digits = 5)

在此處輸入圖片說明

我是否正確地解決了這個問題? 我還對數學進行了兩次和三次檢查,所以我認為它應該是正確的。 盡管如此,我擔心的一件事是beta4是 0.00000。 這是否表明我犯了錯誤? 正如我所說,我已經檢查了我所有的代碼和數學,所以,據我所知,一切似乎都很好。

好的,我剛剛發現問題是我沒有用足夠的數字(5 不夠)打印值來查看該值不是 0.00000。 其他一切都很好。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM