置信区间的覆盖概率

Question

From the Bernoulli(p), I want to calculate the coverage probability for various sample sizes (n= 10, 15, 20, 25, 30, 50, 100, 150, 200), and for each sample size at p = 0.01, 0.4, and 0.8.根据伯努利（p），我想计算各种样本大小（n = 10、15、20、25、30、50、100、150、200）以及 p = 0.01 时的每个样本大小的覆盖概率， 0.4 和 0.8。

this is my attempt but shows 0 everywhere except for p=0.01这是我的尝试，但除了 p=0.01 外，其他地方都显示为 0

f3 <- function(n,probs) {
  res1 <- lapply(n, function(i) {
    setNames(lapply(probs, function(p) {
      m<-10000
      n<-i
      p<-p
      x <- rbinom(m,size=1,p=p)
      p.hat <- x/n
      lower.Wald <- p.hat - 1.96 * sqrt(p.hat*(1-p.hat)/n)
      upper.Wald <- p.hat + 1.96 * sqrt(p.hat*(1-p.hat)/n)
      p.in.CI <- (lower.Wald <p) & ( p < upper.Wald )
      covprob1<- mean(p.in.CI)
      covprob1
    }),paste0("p=",probs))
  })
  names(res1) <- paste0("n=",n)
  res1
}
f3(n=c(10,15,20,25,30,50,100,150,200),probs = c(0.01,0.4, 0.8))

Answer 1

Background背景

The code in the question attempts to run Monte Carlo simulations on Bernoulli trials to calculate coverage percentages using Wald confidence intervals.问题中的代码尝试在伯努利试验上运行蒙特卡罗模拟，以使用 Wald 置信区间计算覆盖率。 One of the problems in the code is that a number of the calculations are executed on individual observations rather than sums of successes and failures.代码中的一个问题是，许多计算是根据单个观察而不是成功和失败的总和来执行的。 R is primarily a vector processor and the code does not aggregate the individual observations to counts of successes and failures in order to calculate the Wald confidence intervals. R 主要是一个矢量处理器，代码不会将单个观察结果汇总为成功和失败的计数以计算 Wald 置信区间。

This causes the code to always generate 0 for coverage percentage for p values above 0.01 for sample sizes tested in the original post.这会导致代码始终为原始帖子中测试的样本大小的 p 值高于 0.01 的覆盖率生成 0。 We use code from the original post to isolate where error is introduced into the algorithm.我们使用原始帖子中的代码来隔离将错误引入算法的位置。

We set a seed, assign values to m , n , and p , and attempt to generate 10,000 Bernoulli trials of size n .我们设置一个种子，为m 、 n和p赋值，并尝试生成 10,000 个大小为n的伯努利试验。

set.seed(95014)
m<-10000
n<-5
p<-0.01
x <- rbinom(m,size=1,prob = p)

At this point x is a vector containing 10,000 true = 1, false = 0 values.此时x是一个包含 10,000 个真 = 1、假 = 0 值的向量。

> table(x)
x
   0    1 
9913   87

However, x is NOT 10,000 runs of samples of 5 Bernoulli trials.但是， x不是5 次伯努利试验的 10,000 次运行样本。 Given this fact, all subsequent processing by the algorithm in the original code will be incorrect.鉴于这一事实，原始代码中算法的所有后续处理都将是不正确的。

The next line of code calculates a value for p.hat .下一行代码计算p.hat的值。 This should be a single value aggregated across the 5 elements in a sample, not a vector of 10,000 elements unless each element in x represents a 5 element sample.这应该是样本中 5 个元素聚合的单个值，而不是 10,000 个元素的向量，除非 x 中的每个元素都代表 5 个元素的样本。

p.hat <- x/n
table(p.hat)

> table(p.hat)
p.hat
   0  0.2 
9913   87

An accurate calculation for p.hat , treating the vector as one sample would be the following: p.hat的准确计算，将向量视为一个样本如下：

> p.hat <- sum(x)/length(x)
> p.hat
[1] 0.0087

...which is very close to the population p-value of 0.01 that we assigned earlier in the code, but still does not represent 10,000 trials of sample size 5. Instead, p.hat as defined above represents one Bernoulli trial with sample size 10,000. ...这非常接近我们之前在代码中指定的总体 p 值 0.01，但仍不代表样本量为 5 的 10,000 次试验。相反，上面定义的p.hat代表样本量的一次伯努利试验10,000。

Two minor changes to fix the code修复代码的两个小改动

After independently developing a Monte Carlo simulator for Bernoulli trials (see below for details), it becomes clear that with a couple of tweaks we can remediate the code from the original post to make it produce valid results.在为伯努利试验独立开发了蒙特卡洛模拟器后（详见下文），很明显，通过一些调整，我们可以修复原始帖子中的代码，使其产生有效的结果。

First, we multiply m by n in the first argument to rbinom() , so the number of trials produced is 10,000 times sample size.首先，我们在rbinom()的第一个参数中将m乘以n ，因此产生的试验次数是样本大小的 10,000 倍。 We also cast the result as a matrix with 10,000 rows and n columns.我们还将结果转换为具有 10,000 行和n列的矩阵。

Second, we use rowSums() to sum the trials to counts of successes, and divide the resulting vector of 10,000 elements by n , producing correct values for p.hat , given sample size.其次，我们使用rowSums()将试验与成功计数相加，并将 10,000 个元素的结果向量除以n ，在给定样本大小的情况下为p.hat生成正确的值。 Once p.hat is corrected, the rest of the code works as originally intended.更正p.hat后，代码的 rest 将按原计划工作。

f3 <- function(n,probs) {
     res1 <- lapply(n, function(i) {
          setNames(lapply(probs, function(p) {
               m<-10000
               n<-i
               p<-p
               # make number of trials m*n, and store 
               # as a matrix of 10,000 rows * n columns 
               x <- matrix(rbinom(m*n,size=1,prob = p),nrow=10000,ncol=i)
               # p.hat is simply rowSums(x) divided by n
               p.hat <- rowSums(x)/n
               lower.Wald <- p.hat - 1.96 * sqrt(p.hat*(1-p.hat)/n)
               upper.Wald <- p.hat + 1.96 * sqrt(p.hat*(1-p.hat)/n)
               p.in.CI <- (lower.Wald <p) & ( p < upper.Wald )
               covprob1<- mean(p.in.CI)
               covprob1
          }),paste0("p=",probs))
     })
     names(res1) <- paste0("n=",n)
     res1
}

f3(n=c(10,15,20,25,30,50,100,150,200),probs = c(0.01,0.4, 0.8))

...and the output: ...和 output：

> f3(n=c(10,15,20,25,30,50,100,150,200),probs = c(0.01,0.4, 0.8))
$`n=10`
$`n=10`$`p=0.01`
[1] 0.0983

$`n=10`$`p=0.4`
[1] 0.9016

$`n=10`$`p=0.8`
[1] 0.8881


$`n=15`
$`n=15`$`p=0.01`
[1] 0.1387

$`n=15`$`p=0.4`
[1] 0.9325

$`n=15`$`p=0.8`
[1] 0.8137


$`n=20`
$`n=20`$`p=0.01`
[1] 0.1836

$`n=20`$`p=0.4`
[1] 0.9303

$`n=20`$`p=0.8`
[1] 0.9163


$`n=25`
$`n=25`$`p=0.01`
[1] 0.2276

$`n=25`$`p=0.4`
[1] 0.94

$`n=25`$`p=0.8`
[1] 0.8852


$`n=30`
$`n=30`$`p=0.01`
[1] 0.2644

$`n=30`$`p=0.4`
[1] 0.9335

$`n=30`$`p=0.8`
[1] 0.9474


$`n=50`
$`n=50`$`p=0.01`
[1] 0.3926

$`n=50`$`p=0.4`
[1] 0.9421

$`n=50`$`p=0.8`
[1] 0.9371


$`n=100`
$`n=100`$`p=0.01`
[1] 0.6313

$`n=100`$`p=0.4`
[1] 0.9495

$`n=100`$`p=0.8`
[1] 0.9311

These results look more like what we expect from the simulation: poor coverage at low values of p / small sample sizes, where for a given p value coverage improves as sample size increases.这些结果看起来更像我们对模拟的期望：在低 p 值/小样本量下覆盖率较差，对于给定的 p 值，覆盖率随着样本量的增加而提高。

Starting from scratch: a basic simulator for one p-value / sample size从头开始：一个 p 值/样本大小的基本模拟器

Here we develop a solution that iteratively builds on a set of basic building blocks: one p-value, one sample size, and a 95% confidence interval.在这里，我们开发了一个解决方案，它以一组基本构建块迭代构建：一个 p 值、一个样本大小和 95% 置信区间。 The simulator also tracks parameters so we can combine results from multiple simulations into data frames that are easy to read and interpret.该模拟器还跟踪参数，因此我们可以将多个模拟的结果组合成易于阅读和解释的数据帧。

First, we create a simulator that tests 10,000 samples of size drawn from a Bernoulli distribution with a given probability value.首先，我们创建了一个模拟器，用于测试从具有给定概率值的伯努利分布中抽取的 10,000 个大小样本。 It aggregates successes and failures, and then calculates Wald confidence intervals, and generates an output data frame.它聚合成功和失败，然后计算 Wald 置信区间，并生成 output 数据帧。 For the purposes of the simulation, the p-values we pass to the simulator represent the the "true" population probability value.出于模拟的目的，我们传递给模拟器的 p 值代表“真实”总体概率值。 We will see how frequently the simulations include the population p-value in their confidence intervals.我们将看到模拟在置信区间中包含总体 p 值的频率。

We set parameters to represent a true population p-value of 0.5, a sample size of 5, and z-value of 1.96 representing a 95% confidence interval.我们设置参数来表示真实的总体 p 值为 0.5、样本量为 5、z 值为 1.96，表示 95% 的置信区间。 We created function arguments for these constants so we can vary them in subsequent code.我们为这些常量创建了 function arguments，因此我们可以在后续代码中更改它们。 We also use set.seed() to make the results reproducible.我们还使用set.seed()使结果可重现。

set.seed(90125)
simulationList <- lapply(1:10000,function(x,p_value,sample_size,z_val){
     trial <- x
     successes <- sum(rbinom(sample_size,size=1,prob = p_value))
     observed_p <- successes / sample_size
     z_value <- z_val
     lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
     upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
     data.frame(trial,p_value,observed_p,z_value,lower.Wald,upper.Wald)
},0.5,5,1.96)

A key difference between this code and the code from the original question is that we take samples of 5 from rbinom() and immediately sum the number of true values to calculate the number of successes.此代码与原始问题的代码之间的关键区别在于，我们从rbinom()中抽取 5 个样本，并立即将真实值的数量相加以计算成功的数量。 This allows us to calculate observed_p as successes / sample_size .这使我们可以将observed_p计算为successes / sample_size 。 Now we have an empirically generated version of what was called p.hat in the original question.现在我们有了原始问题中称为p.hat的经验生成版本。

The resulting list includes a data frame summarizing the results of each trial.结果列表包括一个总结每个试验结果的数据框。

We combine the list of data frames into a single data frame with do.call()我们使用do.call()将数据帧列表组合成单个数据帧

simulation_df <- do.call(rbind,simulationList)

At this point simulation_df is a data frame containing 10,000 rows and 6 columns.此时simulation_df是一个包含10000行6列的数据框。 Each row represents the results from one simulation of sample_size Bernoulli trials.每行代表一次sample_size Bernoulli 试验模拟的结果。 We'll print the first few rows to illustrate the contents of the data frame.我们将打印前几行来说明数据框的内容。

> dim(simulation_df)
[1] 10000     6
> head(simulation_df)
  trial p_value observed_p z_value  lower.Wald upper.Wald
1     1     0.5        0.6    1.96  0.17058551  1.0294145
2     2     0.5        0.2    1.96 -0.15061546  0.5506155
3     3     0.5        0.6    1.96  0.17058551  1.0294145
4     4     0.5        0.2    1.96 -0.15061546  0.5506155
5     5     0.5        0.2    1.96 -0.15061546  0.5506155
6     6     0.5        0.4    1.96 -0.02941449  0.8294145
>

Notice how the observed_p values are distinct values in increments of 0.2.请注意observed_p值是如何以0.2 为增量的不同值。 This is because when sample size is 5, the number of TRUE values in each sample can vary between 0 and 5. A histogram of observed_p makes this clear.这是因为当样本大小为 5 时，每个样本中 TRUE 值的数量可以在 0 到 5 之间observed_p 。observed_p 的直方图清楚地说明了这一点。

Even with a sample size of 5, we can see the shape of a binomial distribution emerging in the histogram.即使样本大小为 5，我们也可以在直方图中看到二项分布的形状。

Next, we calculate the coverage percentage by summing the rows where the population p-value (represented as p_value ) is within the Wald confidence interval.接下来，我们通过对总体 p 值（表示为p_value ）在 Wald 置信区间内的行求和来计算覆盖率。

# calculate coverage: % of simulations where population p-value is
# within Wald confidence limits generated via simulation
sum(simulation_df$p_value > simulation_df$lower.Wald & 
         simulation_df$p_value < simulation_df$upper.Wald) / 10000 * 100

 > sum(simulation_df$p_value > simulation_df$lower.Wald & 
+          simulation_df$p_value < simulation_df$upper.Wald) / 10000 * 100
[1] 93.54

A coverage of 93.54% is a reasonable simulation, given that we calculated a 95% confidence interval.考虑到我们计算了 95% 的置信区间，93.54% 的覆盖率是合理的模拟。 We interpret this as 93.5% of the samples generated Wald confidence intervals that included the population p-value of 0.5.我们将此解释为 93.5% 的样本生成了 Wald 置信区间，其中包括 0.5 的总体 p 值。

Therefore, we conclude that our simulator appears to be generating valid results.因此，我们得出结论，我们的模拟器似乎正在生成有效的结果。 We will build on this basic design to execute simulations with multiple p-values and sample sizes.我们将在此基本设计的基础上执行具有多个 p 值和样本大小的模拟。

Simulating multiple p-values for a given sample size模拟给定样本大小的多个 p 值

Next, we'll vary the probability values to see the percentage coverage for 10,000 samples of 5 observations.接下来，我们将改变概率值以查看 5 个观测值的 10,000 个样本的覆盖百分比。 Since the statistics literature such as Sauro and Lewis, 2005 tells us that Wald confidence intervals have poor coverage for very low and very high p-values, we've added an argument to calculate Adjusted Wald scores.由于Sauro 和 Lewis, 2005等统计文献告诉我们，Wald 置信区间对于非常低和非常高的 p 值的覆盖率都很差，因此我们添加了一个参数来计算调整后的 Wald 分数。 We'll set this argument to FALSE for the time being.我们暂时将此参数设置为FALSE 。

p_val_simulations <- lapply(c(0.01,0.1,0.4,.5,.8),function(p_val){
     aSim <- lapply(1:10000,function(x,p_value,sample_size,z_val,adjWald){
          trial <- x
          successes <- sum(rbinom(sample_size,size=1,prob = p_value))
          if(adjWald){
               successes <- successes + 2
               sample_size <- sample_size + 4
          }
          observed_p <- sum(successes) / (sample_size)
          z_value <- z_val
          lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
          upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
          data.frame(trial,p_value,sample_size,observed_p,z_value,adjWald,lower.Wald,upper.Wald)
     },p_val,5,1.96,FALSE)
     # bind results to 1 data frame & return 
     do.call(rbind,aSim)
})

The resulting list, p_val_simulations contains one data frame for each p-value run through the simulation.结果列表p_val_simulations包含一个数据帧，用于模拟运行的每个 p 值。

We combine these data frames and calculate coverage percentages as follows.我们结合这些数据框并计算覆盖率百分比如下。

do.call(rbind,lapply(p_val_simulations,function(x){
     p_value <- min(x$p_value)
     adjWald <- as.logical(min(x$adjWald))
     sample_size <- min(x$sample_size) - (as.integer(adjWald) * 4)
     coverage_pct <- (sum(x$p_value > x$lower.Wald & 
              x$p_value < x$upper.Wald) / 10000)*100
     data.frame(p_value,sample_size,adjWald,coverage_pct)
     
}))

As expected, the coverage is very poor the further we are away from a p-value of 0.5.正如预期的那样，我们离 p 值 0.5 越远，覆盖率就越差。

  p_value sample_size adjWald coverage_pct
1    0.01           5   FALSE         4.53
2    0.10           5   FALSE        40.23
3    0.40           5   FALSE        83.49
4    0.50           5   FALSE        94.19
5    0.80           5   FALSE        66.35

However, when we rerun the simulation with adjWald = TRUE , we get the following results.但是，当我们使用adjWald = TRUE重新运行模拟时，我们会得到以下结果。

  p_value sample_size adjWald coverage_pct
1    0.01           5    TRUE        95.47
2    0.10           5    TRUE        91.65
3    0.40           5    TRUE        98.95
4    0.50           5    TRUE        94.19
5    0.80           5    TRUE        94.31

These are much better, particularly for p-values close the the ends of the distribution.这些要好得多，特别是对于接近分布末端的 p 值。

The final task remaining is to modify the code so it executes Monte Carlo simulations at varying levels of sample size.剩下的最后一项任务是修改代码，以便在不同级别的样本量下执行蒙特卡罗模拟。 Before proceeding further, we calculate the runtime for the code we've developed thus far.在继续之前，我们计算到目前为止我们开发的代码的运行时间。

system.time() tells us that the code to run 5 different Monte Carlo simulations of 10,000 Bernoulli trials with sample size of 5 takes about 38 seconds to run on a MacBook Pro 15 with a 2.5 Ghz Intel i-7 processor. system.time()告诉我们，在配备 2.5 Ghz Intel i-7 处理器的 MacBook Pro 15 上运行 10,000 次伯努利试验的 5 次不同蒙特卡罗模拟（样本大小为 5）的代码大约需要 38 秒。 Therefore, we expect that the next simulation will take multiple minutes to run.因此，我们预计下一次模拟将需要几分钟才能运行。

Varying p-value and sample size改变 p 值和样本量

We add another level of lapply() to account for varying the sample size.我们添加了另一个级别的lapply()来解释样本大小的变化。 We have also set the adjWald parameter to FALSE so we can see how the base Wald confidence interval behaves at p = 0.01 and 0.10.我们还将adjWald参数设置为FALSE ，因此我们可以看到基本 Wald 置信区间在 p = 0.01 和 0.10 时的表现。

set.seed(95014)
system.time(sample_simulations <- lapply(c(10, 15, 20, 25, 30, 50,100, 150, 200),function(s_size){
     lapply(c(0.01,0.1,0.8),function(p_val){
          aSim <- lapply(1:10000,function(x,p_value,sample_size,z_val,adjWald){
               trial <- x
               successes <- sum(rbinom(sample_size,size=1,prob = p_value))
               if(adjWald){
                    successes <- successes + 2
                    sample_size <- sample_size + 4
               }
               observed_p <- sum(successes) / (sample_size)
               z_value <- z_val
               lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
               upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
               data.frame(trial,p_value,sample_size,observed_p,z_value,adjWald,lower.Wald,upper.Wald)
          },p_val,s_size,1.96,FALSE)
          # bind results to 1 data frame & return 
          do.call(rbind,aSim)
     })
}))

Elapsed time on the MacBook Pro was 217.47 seconds, or about 3.6 minutes. MacBook Pro 上的经过时间为 217.47 秒，或约 3.6 分钟。 Given that we ran 27 different Monte Carlo simulations, the code completed one simulation each 8.05 seconds.鉴于我们运行了 27 次不同的蒙特卡洛模拟，代码每 8.05 秒完成一次模拟。

The final step is to process the list of lists to create an output data frame that summarizes the analysis.最后一步是处理列表列表以创建总结分析的 output 数据框。 We aggregate the content, combine rows into data frames, then bind the resulting list of data frames.我们聚合内容，将行组合成数据框，然后绑定数据框的结果列表。

summarizedSimulations <- lapply(sample_simulations,function(y){
     do.call(rbind,lapply(y,function(x){
          p_value <- min(x$p_value)
          adjWald <- as.logical(min(x$adjWald))
          sample_size <- min(x$sample_size) - (as.integer(adjWald) * 4)
          coverage_pct <- (sum(x$p_value > x$lower.Wald & 
                                    x$p_value < x$upper.Wald) / 10000)*100
          data.frame(p_value,sample_size,adjWald,coverage_pct)
          
     }))
})

results <- do.call(rbind,summarizedSimulations)

One last step, we sort the data by p-value to see how coverage improves as sample size increases.最后一步，我们按 p 值对数据进行排序，以查看覆盖率如何随着样本量的增加而提高。

results[order(results$p_value,results$sample_size),]

...and the output: ...和 output：

> results[order(results$p_value,results$sample_size),]
   p_value sample_size adjWald coverage_pct
1     0.01          10   FALSE         9.40
4     0.01          15   FALSE        14.31
7     0.01          20   FALSE        17.78
10    0.01          25   FALSE        21.40
13    0.01          30   FALSE        25.62
16    0.01          50   FALSE        39.65
19    0.01         100   FALSE        63.67
22    0.01         150   FALSE        77.94
25    0.01         200   FALSE        86.47
2     0.10          10   FALSE        64.25
5     0.10          15   FALSE        78.89
8     0.10          20   FALSE        87.26
11    0.10          25   FALSE        92.10
14    0.10          30   FALSE        81.34
17    0.10          50   FALSE        88.14
20    0.10         100   FALSE        93.28
23    0.10         150   FALSE        92.79
26    0.10         200   FALSE        92.69
3     0.80          10   FALSE        88.26
6     0.80          15   FALSE        81.33
9     0.80          20   FALSE        91.88
12    0.80          25   FALSE        88.38
15    0.80          30   FALSE        94.67
18    0.80          50   FALSE        93.44
21    0.80         100   FALSE        92.96
24    0.80         150   FALSE        94.48
27    0.80         200   FALSE        93.98
>

Interpreting the results解释结果

The Monte Carlo simulations illustrate that Wald confidence intervals provide poor coverage at a p-value of 0.01, even with a sample size of 200. Coverage improves at p-value of 0.10, where all but one of the simulations at sample sizes 25 and above exceeded 90%.蒙特卡洛模拟表明，即使样本量为 200，Wald 置信区间在 p 值为 0.01 时提供较差的覆盖率。覆盖率在 p 值为 0.10 时提高，在样本量为 25 及以上的模拟中，除了一个模拟之外的所有模拟超过 90%。 Coverage is even better for the p-value of 0.80, where all but one of the sample sizes above 15 exceeded 91% coverage.对于 0.80 的 p 值，覆盖率甚至更好，其中除了一个样本大小超过 15 之外，所有样本量都超过了 91% 的覆盖率。

Coverage improves further when we calculate Adjusted Wald confidence intervals, especially at lower p-values.当我们计算调整后的 Wald 置信区间时，覆盖率会进一步提高，尤其是在 p 值较低的情况下。

results[order(results$p_value,results$sample_size),]
   p_value sample_size adjWald coverage_pct
1     0.01          10    TRUE        99.75
4     0.01          15    TRUE        98.82
7     0.01          20    TRUE        98.30
10    0.01          25    TRUE        97.72
13    0.01          30    TRUE        99.71
16    0.01          50    TRUE        98.48
19    0.01         100    TRUE        98.25
22    0.01         150    TRUE        98.05
25    0.01         200    TRUE        98.34
2     0.10          10    TRUE        93.33
5     0.10          15    TRUE        94.53
8     0.10          20    TRUE        95.61
11    0.10          25    TRUE        96.72
14    0.10          30    TRUE        96.96
17    0.10          50    TRUE        97.28
20    0.10         100    TRUE        95.06
23    0.10         150    TRUE        96.15
26    0.10         200    TRUE        95.44
3     0.80          10    TRUE        97.06
6     0.80          15    TRUE        98.10
9     0.80          20    TRUE        95.57
12    0.80          25    TRUE        94.88
15    0.80          30    TRUE        96.31
18    0.80          50    TRUE        95.05
21    0.80         100    TRUE        95.37
24    0.80         150    TRUE        94.62
27    0.80         200    TRUE        95.96

The Adjusted Wald confidence intervals provide consistently better coverage across the range of p-values and sample sizes, with an average coverage of 96.72% across the 27 simulations.调整后的 Wald 置信区间在 p 值和样本大小的范围内始终提供更好的覆盖率，在 27 次模拟中平均覆盖率为 96.72%。 This is consistent with the literature that indicates Adjusted Wald confidence intervals are more conservative than unadjusted Wald confidence intervals.这与表明调整后的 Wald 置信区间比未调整的 Wald 置信区间更保守的文献一致。

At this point we have a working Monte Carlo simulator that produces valid results for multiple p-values and sample sizes.在这一点上，我们有一个有效的蒙特卡罗模拟器，它可以为多个 p 值和样本大小产生有效的结果。 We can now review the code to find opportunities to optimize its performance.我们现在可以查看代码以寻找优化其性能的机会。

Optimizing the solution优化解决方案

Following the old programming aphorism of Make it work, make it right, make it fast , working the solution out in an iterative manner helped enabled me to develop a solution that produces valid results.遵循Make it work, make it right, make it fast的旧编程格言，以迭代的方式解决方案帮助我开发出产生有效结果的解决方案。

Understanding of how to make it right enabled me not only to see the flaw in the code posted in the question, but it also enabled me to envision a solution.了解如何使它正确不仅使我能够看到问题中发布的代码中的缺陷，而且还使我能够设想解决方案。 That solution, using rbinom() once with an argument of m * n , casting the result as a matrix() , and then using rowSums() to calculate p-values, led me to see how I could optimize my own solution by eliminating thousands of rbinom() calls from each simulation.该解决方案使用rbinom()一次，参数为m * n ，将结果转换为matrix() ，然后使用rowSums()计算 p 值，这让我看到了如何通过消除优化自己的解决方案来自每个模拟的数千个rbinom()调用。

Refactoring for performance重构性能

We create a function, binomialSimulation() , that generates Bernoulli trials and Wald confidence intervals with a single call to rbinom() , regardless of the number of trials in a single simulation.我们创建了一个 function, binomialSimulation() ，它通过一次调用rbinom()生成伯努利试验和 Wald 置信区间，而不管单个模拟中的试验次数如何。 We also aggregate results so each simulation generates a data frame containing one row describing the results of the test.我们还汇总了结果，因此每次模拟都会生成一个数据框，其中包含一行描述测试结果的行。

set.seed(90125)
binomialSimulation <- function(trial_size,p_value,sample_size,z_value){
     trials <- matrix(rbinom(trial_size * sample_size,size=1,prob = p_value),
                      nrow = trial_size,ncol = sample_size)
     observed_p <- rowSums(trials) / sample_size
     lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
     upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
     coverage_pct <- sum(p_value > lower.Wald & 
                         p_value < upper.Wald) / 10000 * 100
     data.frame(sample_size,p_value,avg_observed_p=mean(observed_p),coverage_pct)
     
}

We run the function with a population p-value of 0.5, a sample size of 5, and 10,000 trials and a confidence interval of 95%, and track the execution time with system.time() .我们运行 function，总体 p 值为 0.5，样本量为 5，试验次数为 10,000 次，置信区间为 95%，并使用system.time()跟踪执行时间。 The optimized function is 99.8% faster than the original implementation described earlier in the article, which runs in about 6.09 seconds.经过优化的 function 比本文前面描述的原始实现快 99.8%，后者运行时间约为 6.09 秒。

system.time(binomialSimulation(10000,0.5,5,1.96))

> system.time(binomialSimulation(10000,0.5,5,1.96))
   user  system elapsed 
  0.015   0.000   0.015

We will skip the intermediate steps and present the optimized version of the iteratively developed solution.我们将跳过中间步骤并展示迭代开发解决方案的优化版本。

system.time(results <- do.call(rbind,lapply(c(5,10,15,20,25,50,100,250),
                                function(aSample_size,p_values) {
     do.call(rbind,lapply(p_values,function(a,b,c,d){
             binomialSimulation(p_value = a,
                                trial_size = b,
                                sample_size = aSample_size,
                                z_value = d)
     },10000,5,1.96))
},c(0.1,0.4,0.8))))

As expected, elimination of the thousands of unnecessary calls to rbinom() radically improves performance of the solution.正如预期的那样，消除对rbinom()的数千个不必要的调用从根本上提高了解决方案的性能。

   user  system elapsed 
  0.777   0.053   0.830

Given that our prior solution ran in 217 seconds, performance of the optimized version is really impressive.鉴于我们之前的解决方案在 217 秒内运行，优化版本的性能确实令人印象深刻。 Now we have a solution that not only generates accurate Monte Carlo simulations of Bernoulli trials, but it's also fast.现在我们有了一个解决方案，它不仅可以生成准确的伯努利试验蒙特卡罗模拟，而且速度也很快。

置信区间的覆盖概率

问题描述

1 个解决方案

解决方案1
4 2020-11-27 04:30:52

Background背景

Two minor changes to fix the code修复代码的两个小改动

Starting from scratch: a basic simulator for one p-value / sample size从头开始：一个 p 值/样本大小的基本模拟器

Simulating multiple p-values for a given sample size模拟给定样本大小的多个 p 值

Varying p-value and sample size改变 p 值和样本量

Interpreting the results解释结果

Optimizing the solution优化解决方案

Refactoring for performance重构性能

置信区间的覆盖概率

问题描述

1 个解决方案

解决方案1 4 2020-11-27 04:30:52

Background背景

Two minor changes to fix the code修复代码的两个小改动

Starting from scratch: a basic simulator for one p-value / sample size从头开始：一个 p 值/样本大小的基本模拟器

Simulating multiple p-values for a given sample size模拟给定样本大小的多个 p 值

Varying p-value and sample size改变 p 值和样本量

Interpreting the results解释结果

Optimizing the solution优化解决方案

Refactoring for performance重构性能

解决方案1
4 2020-11-27 04:30:52