繁体   English   中英

如何在 R 中执行引导程序并估计方差?

[英]How do I perform a bootstrap in R and estimate the variance?

数据集“male.wt”是 100 个男性出租车顾客的权重集合。 使用 bootstrap 抽样来估计使用出租车的男性人群的方差。

我正在尝试在 R 中使用 boot() function,我完全感到困惑。 这是给我做这个问题的数据集。

malewt = structure(list(x = c(184.291514203183, 238.183299307855, 217.544606414151, 
233.931926116624, 229.12042611005, 243.881689583996, 259.230802242781, 
217.939619221934, 137.636923032685, 170.379447345948, 195.852641733122, 
185.832690963969, 186.676714564328, 215.711426139253, 186.413495533494, 
237.83223009147, 180.124153998503, 215.393108191779, 188.846039074142, 
142.373198101437, 233.234630310378, 186.141325709762, 220.062112044187, 
213.851199681057, 148.622198219149, 197.438771523918, 206.920961557603, 
190.874857845699, 217.889075914836, 152.318099234166, 218.089620221194, 
196.736930479919, 235.122424359223, 217.446826955801, 201.352404389309, 
216.290374765672, 173.85609629461, 215.961826427613, 213.87732008193, 
177.952521505061, 132.734879010504, 221.707886490889, 224.336488758995, 
218.604034088911, 228.157844234374, 196.544661577149, 228.787736646279, 
237.009125179319, 194.73342863066, 190.569523115323, 192.198491573128, 
204.589742888237, 198.662802876867, 195.238634847898, 201.834508205684, 
220.989134791548, 180.006492709174, 168.199898332071, 250.705048451896, 
209.824701073225, 212.36145906497, 205.250728119598, 196.572466206237, 
186.818746613236, 138.493748904934, 193.572713536688, 171.605082170236, 
243.803356964054, 188.768040728907, 201.408088256783, 196.23847341016, 
202.686141019735, 167.25735383257, 171.907526464761, 224.396425425799, 
183.494470842407, 220.15969728649, 143.164453849305, 152.539942653094, 
198.52004650272, 185.145815429412, 206.741840856439, 259.866591064748, 
135.212011256414, 164.2297511973, 200.623731663392, 199.599177980586, 
175.970651370212, 197.304554981825, 189.116019204125, 198.630618004183, 
185.096675814379, 203.780160863916, 174.584831373708, 150.483001599829, 
223.78078870159, 170.772181294322, 218.770812392057, 151.645084212409, 
210.350813872005)), class = "data.frame", row.names = c(NA, -100L
))

非常模棱两可的问题。 以下是如何 plot 方差的自举估计量的直方图:

library(purrr)
boots <- 100
data <- structure(list(x = c(184.291514203183, 238.183299307855, 217.544606414151, 233.931926116624, 229.12042611005, 243.881689583996, 259.230802242781, 217.939619221934, 137.636923032685, 170.379447345948, 195.852641733122, 185.832690963969, 186.676714564328, 215.711426139253, 186.413495533494, 237.83223009147, 180.124153998503, 215.393108191779, 188.846039074142, 142.373198101437, 233.234630310378, 186.141325709762, 220.062112044187, 213.851199681057, 148.622198219149, 197.438771523918, 206.920961557603, 190.874857845699, 217.889075914836, 152.318099234166, 218.089620221194, 196.736930479919, 235.122424359223, 217.446826955801, 201.352404389309, 216.290374765672, 173.85609629461, 215.961826427613, 213.87732008193, 177.952521505061, 132.734879010504, 221.707886490889, 224.336488758995, 218.604034088911, 228.157844234374, 196.544661577149, 228.787736646279, 237.009125179319, 194.73342863066, 190.569523115323, 192.198491573128, 204.589742888237, 198.662802876867, 195.238634847898, 201.834508205684, 220.989134791548, 180.006492709174, 168.199898332071, 250.705048451896, 209.824701073225, 212.36145906497, 205.250728119598, 196.572466206237, 186.818746613236, 138.493748904934, 193.572713536688, 171.605082170236, 243.803356964054, 188.768040728907, 201.408088256783, 196.23847341016, 202.686141019735, 167.25735383257, 171.907526464761, 224.396425425799, 183.494470842407, 220.15969728649, 143.164453849305, 152.539942653094, 198.52004650272, 185.145815429412, 206.741840856439, 259.866591064748, 135.212011256414, 164.2297511973, 200.623731663392, 199.599177980586, 175.970651370212, 197.304554981825, 189.116019204125, 198.630618004183, 185.096675814379, 203.780160863916, 174.584831373708, 150.483001599829, 223.78078870159, 170.772181294322, 218.770812392057, 151.645084212409, 210.350813872005)), class = "data.frame", row.names = c(NA, -100L ))
map(seq_len(boots),
    ~ data$x[sample.int(length(data$x), length(data$x), T)]
) %>% 
    map_dbl(var) %>% 
    hist()

创建于 2022-12-09,使用reprex v2.0.2

这是 base package boot的解决方案。 在方差的情况下,很容易引导它。

  1. 创建一个 function bootvar来计算每个重采样方差;
  2. function的数据和索引必须为 1st 和 2nd arguments arguments 必须是这些,并且按照这个顺序,索引向量是从数据中进行重采样的。 它是boot时为用户自动创建的向量;
  3. 在 function 中提取索引向量(下面的i )给出的样本;
  4. 计算并返回感兴趣的统计量var

我在两行代码中写了 function 以使其更清楚。

library(boot)

set.seed(2022)

bootvar <- function(data, i) {
  y <- data$x[i]
  var(y)
}
b <- boot(malewt, bootvar, R = 5000)
b
#> 
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot(data = malewt, statistic = bootvar, R = 5000)
#> 
#> 
#> Bootstrap Statistics :
#>     original    bias    std. error
#> t1* 792.2551 -7.657891    106.0901

# bootstrapped variance
mean(b$t)
#> [1] 784.5972

hist(b$t)

创建于 2022-12-09,使用reprex v2.0.2


也可以编写 bootstrap function,使其将向量作为其第一个参数,而不是像上面那样的 data.frame。 然后相应地调整boot调用。 只要将伪 RNG 种子设置为相同的值(在本例中为2022 ),结果就完全相同。

library(boot)

set.seed(2022)

bootvar_x <- function(x, i) {
  y <- x[i]
  var(y)
}
bx <- boot(malewt$x, bootvar_x, R = 5000)
bx
#> 
#> ORDINARY NONPARAMETRIC BOOTSTRAP
#> 
#> 
#> Call:
#> boot(data = malewt$x, statistic = bootvar_x, R = 5000)
#> 
#> 
#> Bootstrap Statistics :
#>     original    bias    std. error
#> t1* 792.2551 -7.657891    106.0901

mean(bx$t)
#> [1] 784.5972

hist(bx$t)

创建于 2022-12-09,使用reprex v2.0.2

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM