如何獲得參數估計值，例如 R 中超過 100 次試驗的數據集的 k 個子組的參數估計值的平均值？

Question

我正在使用 R。 我有以下問題：我需要為我的數據集的每個子組 k（大小相等）執行 100 多次線性 model 試驗，然后我想將參數的估計值作為每個子組超過 100 的參數的平均值試驗。 我開發了以下代碼。 我不確定我是否知道如何在兩個循環中存儲計算平均值所需的每次迭代的參數估計值。 我使用了一個列表（“res”），但由於每次重復我都必須存儲一個向量，這可能不是一個好的選擇：

# Define var-cov matrix
rho <- 0.5
row1 <- rho^(c(0:18))
row2 <- rho^(c(1,0:17))
row3 <- rho^(c(2:1,0:16))
row4 <- rho^(c(3:1,0:15))
row5 <- rho^(c(4:1,0:14))
row6 <- rho^(c(5:1,0:13))
row7 <- rho^(c(6:1,0:12)) 
row8 <- rho^(c(7:1,0:11))
row9 <- rho^(c(8:1,0:10))
row10 <- rho^(c(9:1,0:9))
row11 <- rho^(c(10:1,0:8))
row12 <- rho^(c(11:1,0:7))
row13 <- rho^(c(12:1,0:6))
row14 <- rho^(c(13:1,0:5))
row15 <- rho^(c(14:1,0:4))
row16 <- rho^(c(15:1,0:3))
row17 <- rho^(c(16:1,0:2))
row18 <- rho^(c(17:1,0:1))
row19 <- rho^(c(18:1,0))
S = round(rbind(row1,row2,row3,row4,row5,row6,row7,row8,row9,row10,row11,row12,row13,row14,row15,row16,row17,row18,row19),4)

library(tidyr)
colnames(S) = c("X2","X3","X4","X5","X6","X7","X8","X9","X10","X11","X12","X13","X14","X15","X16","X17","X18","X19","X20")
rownames(S) = colnames(S)

# Make mean vector
mus = rep(1,19); names(mus) = colnames(S)

 res <- list()
 result <- list()
 for(ii in 1:100){ 
    df = mvrnorm(n = 1000, mu = mus, Sigma = S)
    beta <- c(1, runif(19, min = -2.5, max = 2.5))
    eps <- rnorm(1000, 0, 1)
    sigma <- 0.2*(norm(df*beta, type = '2')/norm(eps, type = '2'))
    y <- rowSums(df*beta + sigma*eps)
    df <- data.frame(cbind(y, df))
    ind = sample(rep(1:10,each = nrow(df)/10)) # split the dataset in k=10 subgroups
    k <-lapply(split(1:nrow(df),ind), function(i) df[i,])
    for(i in 1:10){
        fit <-lm(formula = y ~ X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15+X16+X17+X18+X19+X20, 
            data= k[[i]])
        res[[i]] <- fit$coefficients
                  }
        result[[ii]] <- mean(res[[i]])
      }

有人可以幫助我嗎？ 先感謝您。

Answer 1

考慮一下您需要的結構可能會有所幫助。 據我所知，可以在合並系數后計算結果列表。 如果您更喜歡將其放在 data.frame 中，並跟蹤模擬 no、split no，請嘗試以下操作：

library(purrr)
library(MASS)
library(dplyr)
library(broom)

regform =as.formula('y ~ X2+X3+X4+X5+X6+X7+X8+X9+X10+X11+X12+X13+X14+X15+X16+X17+X18+X19+X20')

func = function(ii,mus,S,matrix=FALSE){

 df = mvrnorm(n = 1000, mu = mus, Sigma = S)
 beta <- c(1, runif(19, min = -2.5, max = 2.5))
 eps <- rnorm(1000, 0, 1)
 sigma <- 0.2*(norm(df*beta, type = '2')/norm(eps, type = '2'))
 y <- rowSums(df*beta + sigma*eps)
 df <- data.frame(cbind(y, df))
 df$ind = sample(rep(1:10,each = nrow(df)/10)) 
 
 df <- df %>% group_by(ind) %>% do(tidy(lm(regform,data=.))) %>% mutate(sim=ii)
 if(matrix){
     return(split(df$estimate,df$ind))
 }else{
     return(df)
   }  
} 
        
result = 1:100 %>% map_dfr(~func(.x,mus=mus,S=S,matrix=FALSE))

> head(result)
# A tibble: 6 x 7
# Groups:   ind [1]
    ind term        estimate std.error statistic p.value   sim
  <int> <chr>          <dbl>     <dbl>     <dbl>   <dbl> <int>
1     1 (Intercept)    13.7      13.3      1.02   0.309      1
2     1 X2            -11.1       5.51    -2.02   0.0467     1
3     1 X3              5.61      5.86     0.957  0.341      1
4     1 X4             -1.48      6.22    -0.239  0.812      1
5     1 X5             -3.82      5.89    -0.649  0.518      1
6     1 X6              2.88      5.95     0.485  0.629      1
> tail(result)
# A tibble: 6 x 7
# Groups:   ind [1]
    ind term  estimate std.error statistic p.value   sim
  <int> <chr>    <dbl>     <dbl>     <dbl>   <dbl> <int>
1    10 X15      11.9       6.41     1.85   0.0679   100
2    10 X16      -8.86      5.77    -1.54   0.128    100
3    10 X17       6.68      5.70     1.17   0.245    100
4    10 X18       3.73      5.81     0.641  0.523    100
5    10 X19      -5.28      5.55    -0.952  0.344    100
6    10 X20       1.14      5.40     0.211  0.833    100

如前所述，您需要的系數的平均值只是按 sim 和 ind 分組：

result %>% group_by(sim,ind) %>% summarize(estimate=mean(estimate))
# A tibble: 1,000 x 3
# Groups:   sim [100]
     sim   ind estimate
   <int> <int>    <dbl>
 1     1     1    0.800
 2     1     2    0.771
 3     1     3    0.807
 4     1     4    0.277
 5     1     5    0.632
 6     1     6    0.788
 7     1     7    0.878
 8     1     8    0.987
 9     1     9    0.764
10     1    10    0.611
# … with 990 more rows

以上是我認為更清潔、更容易跟蹤的內容。缺點是它使用了一個 data.frame，如果你要進行大量的回歸，可能會很昂貴。

另一種可能性是將所有內容存儲在矩陣中：

result = map(1:100,~func(.x,mus=mus,S=S,matrix=TRUE))

並獲得手段：

map(result,~map(.x,mean))

如何獲得參數估計值，例如 R 中超過 100 次試驗的數據集的 k 個子組的參數估計值的平均值？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-06-26 12:42:45

如何獲得參數估計值，例如 R 中超過 100 次試驗的數據集的 k 個子組的參數估計值的平均值？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-06-26 12:42:45

解決方案1
1 已采納 2020-06-26 12:42:45