[英]coverage probability for confidence interval
From the Bernoulli(p), I want to calculate the coverage probability for various sample sizes (n= 10, 15, 20, 25, 30, 50, 100, 150, 200), and for each sample size at p = 0.01, 0.4, and 0.8.根据伯努利(p),我想计算各种样本大小(n = 10、15、20、25、30、50、100、150、200)以及 p = 0.01 时的每个样本大小的覆盖概率, 0.4 和 0.8。
this is my attempt but shows 0 everywhere except for p=0.01这是我的尝试,但除了 p=0.01 外,其他地方都显示为 0
f3 <- function(n,probs) {
res1 <- lapply(n, function(i) {
setNames(lapply(probs, function(p) {
m<-10000
n<-i
p<-p
x <- rbinom(m,size=1,p=p)
p.hat <- x/n
lower.Wald <- p.hat - 1.96 * sqrt(p.hat*(1-p.hat)/n)
upper.Wald <- p.hat + 1.96 * sqrt(p.hat*(1-p.hat)/n)
p.in.CI <- (lower.Wald <p) & ( p < upper.Wald )
covprob1<- mean(p.in.CI)
covprob1
}),paste0("p=",probs))
})
names(res1) <- paste0("n=",n)
res1
}
f3(n=c(10,15,20,25,30,50,100,150,200),probs = c(0.01,0.4, 0.8))
The code in the question attempts to run Monte Carlo simulations on Bernoulli trials to calculate coverage percentages using Wald confidence intervals.问题中的代码尝试在伯努利试验上运行蒙特卡罗模拟,以使用 Wald 置信区间计算覆盖率。 One of the problems in the code is that a number of the calculations are executed on individual observations rather than sums of successes and failures.
代码中的一个问题是,许多计算是根据单个观察而不是成功和失败的总和来执行的。 R is primarily a vector processor and the code does not aggregate the individual observations to counts of successes and failures in order to calculate the Wald confidence intervals.
R 主要是一个矢量处理器,代码不会将单个观察结果汇总为成功和失败的计数以计算 Wald 置信区间。
This causes the code to always generate 0 for coverage percentage for p values above 0.01 for sample sizes tested in the original post.这会导致代码始终为原始帖子中测试的样本大小的 p 值高于 0.01 的覆盖率生成 0。 We use code from the original post to isolate where error is introduced into the algorithm.
我们使用原始帖子中的代码来隔离将错误引入算法的位置。
We set a seed, assign values to m
, n
, and p
, and attempt to generate 10,000 Bernoulli trials of size n
.我们设置一个种子,为
m
、 n
和p
赋值,并尝试生成 10,000 个大小为n
的伯努利试验。
set.seed(95014)
m<-10000
n<-5
p<-0.01
x <- rbinom(m,size=1,prob = p)
At this point x
is a vector containing 10,000 true = 1, false = 0 values.此时
x
是一个包含 10,000 个真 = 1、假 = 0 值的向量。
> table(x)
x
0 1
9913 87
However, x
is NOT 10,000 runs of samples of 5 Bernoulli trials.但是,
x
不是5 次伯努利试验的 10,000 次运行样本。 Given this fact, all subsequent processing by the algorithm in the original code will be incorrect.鉴于这一事实,原始代码中算法的所有后续处理都将是不正确的。
The next line of code calculates a value for p.hat
.下一行代码计算
p.hat
的值。 This should be a single value aggregated across the 5 elements in a sample, not a vector of 10,000 elements unless each element in x represents a 5 element sample.这应该是样本中 5 个元素聚合的单个值,而不是 10,000 个元素的向量,除非 x 中的每个元素都代表 5 个元素的样本。
p.hat <- x/n
table(p.hat)
> table(p.hat)
p.hat
0 0.2
9913 87
An accurate calculation for p.hat
, treating the vector as one sample would be the following: p.hat
的准确计算,将向量视为一个样本如下:
> p.hat <- sum(x)/length(x)
> p.hat
[1] 0.0087
...which is very close to the population p-value of 0.01 that we assigned earlier in the code, but still does not represent 10,000 trials of sample size 5. Instead, p.hat
as defined above represents one Bernoulli trial with sample size 10,000. ...这非常接近我们之前在代码中指定的总体 p 值 0.01,但仍不代表样本量为 5 的 10,000 次试验。相反,上面定义的
p.hat
代表样本量的一次伯努利试验10,000。
After independently developing a Monte Carlo simulator for Bernoulli trials (see below for details), it becomes clear that with a couple of tweaks we can remediate the code from the original post to make it produce valid results.在为伯努利试验独立开发了蒙特卡洛模拟器后(详见下文),很明显,通过一些调整,我们可以修复原始帖子中的代码,使其产生有效的结果。
First, we multiply m
by n
in the first argument to rbinom()
, so the number of trials produced is 10,000 times sample size.首先,我们在
rbinom()
的第一个参数中将m
乘以n
,因此产生的试验次数是样本大小的 10,000 倍。 We also cast the result as a matrix with 10,000 rows and n
columns.我们还将结果转换为具有 10,000 行和
n
列的矩阵。
Second, we use rowSums()
to sum the trials to counts of successes, and divide the resulting vector of 10,000 elements by n
, producing correct values for p.hat
, given sample size.其次,我们使用
rowSums()
将试验与成功计数相加,并将 10,000 个元素的结果向量除以n
,在给定样本大小的情况下为p.hat
生成正确的值。 Once p.hat
is corrected, the rest of the code works as originally intended.更正
p.hat
后,代码的 rest 将按原计划工作。
f3 <- function(n,probs) {
res1 <- lapply(n, function(i) {
setNames(lapply(probs, function(p) {
m<-10000
n<-i
p<-p
# make number of trials m*n, and store
# as a matrix of 10,000 rows * n columns
x <- matrix(rbinom(m*n,size=1,prob = p),nrow=10000,ncol=i)
# p.hat is simply rowSums(x) divided by n
p.hat <- rowSums(x)/n
lower.Wald <- p.hat - 1.96 * sqrt(p.hat*(1-p.hat)/n)
upper.Wald <- p.hat + 1.96 * sqrt(p.hat*(1-p.hat)/n)
p.in.CI <- (lower.Wald <p) & ( p < upper.Wald )
covprob1<- mean(p.in.CI)
covprob1
}),paste0("p=",probs))
})
names(res1) <- paste0("n=",n)
res1
}
f3(n=c(10,15,20,25,30,50,100,150,200),probs = c(0.01,0.4, 0.8))
...and the output: ...和 output:
> f3(n=c(10,15,20,25,30,50,100,150,200),probs = c(0.01,0.4, 0.8))
$`n=10`
$`n=10`$`p=0.01`
[1] 0.0983
$`n=10`$`p=0.4`
[1] 0.9016
$`n=10`$`p=0.8`
[1] 0.8881
$`n=15`
$`n=15`$`p=0.01`
[1] 0.1387
$`n=15`$`p=0.4`
[1] 0.9325
$`n=15`$`p=0.8`
[1] 0.8137
$`n=20`
$`n=20`$`p=0.01`
[1] 0.1836
$`n=20`$`p=0.4`
[1] 0.9303
$`n=20`$`p=0.8`
[1] 0.9163
$`n=25`
$`n=25`$`p=0.01`
[1] 0.2276
$`n=25`$`p=0.4`
[1] 0.94
$`n=25`$`p=0.8`
[1] 0.8852
$`n=30`
$`n=30`$`p=0.01`
[1] 0.2644
$`n=30`$`p=0.4`
[1] 0.9335
$`n=30`$`p=0.8`
[1] 0.9474
$`n=50`
$`n=50`$`p=0.01`
[1] 0.3926
$`n=50`$`p=0.4`
[1] 0.9421
$`n=50`$`p=0.8`
[1] 0.9371
$`n=100`
$`n=100`$`p=0.01`
[1] 0.6313
$`n=100`$`p=0.4`
[1] 0.9495
$`n=100`$`p=0.8`
[1] 0.9311
These results look more like what we expect from the simulation: poor coverage at low values of p / small sample sizes, where for a given p value coverage improves as sample size increases.这些结果看起来更像我们对模拟的期望:在低 p 值/小样本量下覆盖率较差,对于给定的 p 值,覆盖率随着样本量的增加而提高。
Here we develop a solution that iteratively builds on a set of basic building blocks: one p-value, one sample size, and a 95% confidence interval.在这里,我们开发了一个解决方案,它以一组基本构建块迭代构建:一个 p 值、一个样本大小和 95% 置信区间。 The simulator also tracks parameters so we can combine results from multiple simulations into data frames that are easy to read and interpret.
该模拟器还跟踪参数,因此我们可以将多个模拟的结果组合成易于阅读和解释的数据帧。
First, we create a simulator that tests 10,000 samples of size drawn from a Bernoulli distribution with a given probability value.首先,我们创建了一个模拟器,用于测试从具有给定概率值的伯努利分布中抽取的 10,000 个大小样本。 It aggregates successes and failures, and then calculates Wald confidence intervals, and generates an output data frame.
它聚合成功和失败,然后计算 Wald 置信区间,并生成 output 数据帧。 For the purposes of the simulation, the p-values we pass to the simulator represent the the "true" population probability value.
出于模拟的目的,我们传递给模拟器的 p 值代表“真实”总体概率值。 We will see how frequently the simulations include the population p-value in their confidence intervals.
我们将看到模拟在置信区间中包含总体 p 值的频率。
We set parameters to represent a true population p-value of 0.5, a sample size of 5, and z-value of 1.96 representing a 95% confidence interval.我们设置参数来表示真实的总体 p 值为 0.5、样本量为 5、z 值为 1.96,表示 95% 的置信区间。 We created function arguments for these constants so we can vary them in subsequent code.
我们为这些常量创建了 function arguments,因此我们可以在后续代码中更改它们。 We also use
set.seed()
to make the results reproducible.我们还使用
set.seed()
使结果可重现。
set.seed(90125)
simulationList <- lapply(1:10000,function(x,p_value,sample_size,z_val){
trial <- x
successes <- sum(rbinom(sample_size,size=1,prob = p_value))
observed_p <- successes / sample_size
z_value <- z_val
lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
data.frame(trial,p_value,observed_p,z_value,lower.Wald,upper.Wald)
},0.5,5,1.96)
A key difference between this code and the code from the original question is that we take samples of 5 from rbinom()
and immediately sum the number of true values to calculate the number of successes.此代码与原始问题的代码之间的关键区别在于,我们从
rbinom()
中抽取 5 个样本,并立即将真实值的数量相加以计算成功的数量。 This allows us to calculate observed_p
as successes / sample_size
.这使我们可以将
observed_p
计算为successes / sample_size
。 Now we have an empirically generated version of what was called p.hat
in the original question.现在我们有了原始问题中称为
p.hat
的经验生成版本。
The resulting list includes a data frame summarizing the results of each trial.结果列表包括一个总结每个试验结果的数据框。
We combine the list of data frames into a single data frame with do.call()
我们使用
do.call()
将数据帧列表组合成单个数据帧
simulation_df <- do.call(rbind,simulationList)
At this point simulation_df
is a data frame containing 10,000 rows and 6 columns.此时
simulation_df
是一个包含10000行6列的数据框。 Each row represents the results from one simulation of sample_size
Bernoulli trials.每行代表一次
sample_size
Bernoulli 试验模拟的结果。 We'll print the first few rows to illustrate the contents of the data frame.我们将打印前几行来说明数据框的内容。
> dim(simulation_df)
[1] 10000 6
> head(simulation_df)
trial p_value observed_p z_value lower.Wald upper.Wald
1 1 0.5 0.6 1.96 0.17058551 1.0294145
2 2 0.5 0.2 1.96 -0.15061546 0.5506155
3 3 0.5 0.6 1.96 0.17058551 1.0294145
4 4 0.5 0.2 1.96 -0.15061546 0.5506155
5 5 0.5 0.2 1.96 -0.15061546 0.5506155
6 6 0.5 0.4 1.96 -0.02941449 0.8294145
>
Notice how the observed_p
values are distinct values in increments of 0.2.请注意
observed_p
值是如何以0.2 为增量的不同值。 This is because when sample size is 5, the number of TRUE values in each sample can vary between 0 and 5. A histogram of observed_p
makes this clear.这是因为当样本大小为 5 时,每个样本中 TRUE 值的数量可以在 0 到 5 之间
observed_p
。observed_p 的直方图清楚地说明了这一点。
Even with a sample size of 5, we can see the shape of a binomial distribution emerging in the histogram.即使样本大小为 5,我们也可以在直方图中看到二项分布的形状。
Next, we calculate the coverage percentage by summing the rows where the population p-value (represented as p_value
) is within the Wald confidence interval.接下来,我们通过对总体 p 值(表示为
p_value
)在 Wald 置信区间内的行求和来计算覆盖率。
# calculate coverage: % of simulations where population p-value is
# within Wald confidence limits generated via simulation
sum(simulation_df$p_value > simulation_df$lower.Wald &
simulation_df$p_value < simulation_df$upper.Wald) / 10000 * 100
> sum(simulation_df$p_value > simulation_df$lower.Wald &
+ simulation_df$p_value < simulation_df$upper.Wald) / 10000 * 100
[1] 93.54
A coverage of 93.54% is a reasonable simulation, given that we calculated a 95% confidence interval.考虑到我们计算了 95% 的置信区间,93.54% 的覆盖率是合理的模拟。 We interpret this as 93.5% of the samples generated Wald confidence intervals that included the population p-value of 0.5.
我们将此解释为 93.5% 的样本生成了 Wald 置信区间,其中包括 0.5 的总体 p 值。
Therefore, we conclude that our simulator appears to be generating valid results.因此,我们得出结论,我们的模拟器似乎正在生成有效的结果。 We will build on this basic design to execute simulations with multiple p-values and sample sizes.
我们将在此基本设计的基础上执行具有多个 p 值和样本大小的模拟。
Next, we'll vary the probability values to see the percentage coverage for 10,000 samples of 5 observations.接下来,我们将改变概率值以查看 5 个观测值的 10,000 个样本的覆盖百分比。 Since the statistics literature such as Sauro and Lewis, 2005 tells us that Wald confidence intervals have poor coverage for very low and very high p-values, we've added an argument to calculate Adjusted Wald scores.
由于Sauro 和 Lewis, 2005等统计文献告诉我们,Wald 置信区间对于非常低和非常高的 p 值的覆盖率都很差,因此我们添加了一个参数来计算调整后的 Wald 分数。 We'll set this argument to
FALSE
for the time being.我们暂时将此参数设置为
FALSE
。
p_val_simulations <- lapply(c(0.01,0.1,0.4,.5,.8),function(p_val){
aSim <- lapply(1:10000,function(x,p_value,sample_size,z_val,adjWald){
trial <- x
successes <- sum(rbinom(sample_size,size=1,prob = p_value))
if(adjWald){
successes <- successes + 2
sample_size <- sample_size + 4
}
observed_p <- sum(successes) / (sample_size)
z_value <- z_val
lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
data.frame(trial,p_value,sample_size,observed_p,z_value,adjWald,lower.Wald,upper.Wald)
},p_val,5,1.96,FALSE)
# bind results to 1 data frame & return
do.call(rbind,aSim)
})
The resulting list, p_val_simulations
contains one data frame for each p-value run through the simulation.结果列表
p_val_simulations
包含一个数据帧,用于模拟运行的每个 p 值。
We combine these data frames and calculate coverage percentages as follows.我们结合这些数据框并计算覆盖率百分比如下。
do.call(rbind,lapply(p_val_simulations,function(x){
p_value <- min(x$p_value)
adjWald <- as.logical(min(x$adjWald))
sample_size <- min(x$sample_size) - (as.integer(adjWald) * 4)
coverage_pct <- (sum(x$p_value > x$lower.Wald &
x$p_value < x$upper.Wald) / 10000)*100
data.frame(p_value,sample_size,adjWald,coverage_pct)
}))
As expected, the coverage is very poor the further we are away from a p-value of 0.5.正如预期的那样,我们离 p 值 0.5 越远,覆盖率就越差。
p_value sample_size adjWald coverage_pct
1 0.01 5 FALSE 4.53
2 0.10 5 FALSE 40.23
3 0.40 5 FALSE 83.49
4 0.50 5 FALSE 94.19
5 0.80 5 FALSE 66.35
However, when we rerun the simulation with adjWald = TRUE
, we get the following results.但是,当我们使用
adjWald = TRUE
重新运行模拟时,我们会得到以下结果。
p_value sample_size adjWald coverage_pct
1 0.01 5 TRUE 95.47
2 0.10 5 TRUE 91.65
3 0.40 5 TRUE 98.95
4 0.50 5 TRUE 94.19
5 0.80 5 TRUE 94.31
These are much better, particularly for p-values close the the ends of the distribution.这些要好得多,特别是对于接近分布末端的 p 值。
The final task remaining is to modify the code so it executes Monte Carlo simulations at varying levels of sample size.剩下的最后一项任务是修改代码,以便在不同级别的样本量下执行蒙特卡罗模拟。 Before proceeding further, we calculate the runtime for the code we've developed thus far.
在继续之前,我们计算到目前为止我们开发的代码的运行时间。
system.time()
tells us that the code to run 5 different Monte Carlo simulations of 10,000 Bernoulli trials with sample size of 5 takes about 38 seconds to run on a MacBook Pro 15 with a 2.5 Ghz Intel i-7 processor. system.time()
告诉我们,在配备 2.5 Ghz Intel i-7 处理器的 MacBook Pro 15 上运行 10,000 次伯努利试验的 5 次不同蒙特卡罗模拟(样本大小为 5)的代码大约需要 38 秒。 Therefore, we expect that the next simulation will take multiple minutes to run.因此,我们预计下一次模拟将需要几分钟才能运行。
We add another level of lapply()
to account for varying the sample size.我们添加了另一个级别的
lapply()
来解释样本大小的变化。 We have also set the adjWald
parameter to FALSE
so we can see how the base Wald confidence interval behaves at p = 0.01 and 0.10.我们还将
adjWald
参数设置为FALSE
,因此我们可以看到基本 Wald 置信区间在 p = 0.01 和 0.10 时的表现。
set.seed(95014)
system.time(sample_simulations <- lapply(c(10, 15, 20, 25, 30, 50,100, 150, 200),function(s_size){
lapply(c(0.01,0.1,0.8),function(p_val){
aSim <- lapply(1:10000,function(x,p_value,sample_size,z_val,adjWald){
trial <- x
successes <- sum(rbinom(sample_size,size=1,prob = p_value))
if(adjWald){
successes <- successes + 2
sample_size <- sample_size + 4
}
observed_p <- sum(successes) / (sample_size)
z_value <- z_val
lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
data.frame(trial,p_value,sample_size,observed_p,z_value,adjWald,lower.Wald,upper.Wald)
},p_val,s_size,1.96,FALSE)
# bind results to 1 data frame & return
do.call(rbind,aSim)
})
}))
Elapsed time on the MacBook Pro was 217.47 seconds, or about 3.6 minutes. MacBook Pro 上的经过时间为 217.47 秒,或约 3.6 分钟。 Given that we ran 27 different Monte Carlo simulations, the code completed one simulation each 8.05 seconds.
鉴于我们运行了 27 次不同的蒙特卡洛模拟,代码每 8.05 秒完成一次模拟。
The final step is to process the list of lists to create an output data frame that summarizes the analysis.最后一步是处理列表列表以创建总结分析的 output 数据框。 We aggregate the content, combine rows into data frames, then bind the resulting list of data frames.
我们聚合内容,将行组合成数据框,然后绑定数据框的结果列表。
summarizedSimulations <- lapply(sample_simulations,function(y){
do.call(rbind,lapply(y,function(x){
p_value <- min(x$p_value)
adjWald <- as.logical(min(x$adjWald))
sample_size <- min(x$sample_size) - (as.integer(adjWald) * 4)
coverage_pct <- (sum(x$p_value > x$lower.Wald &
x$p_value < x$upper.Wald) / 10000)*100
data.frame(p_value,sample_size,adjWald,coverage_pct)
}))
})
results <- do.call(rbind,summarizedSimulations)
One last step, we sort the data by p-value to see how coverage improves as sample size increases.最后一步,我们按 p 值对数据进行排序,以查看覆盖率如何随着样本量的增加而提高。
results[order(results$p_value,results$sample_size),]
...and the output: ...和 output:
> results[order(results$p_value,results$sample_size),]
p_value sample_size adjWald coverage_pct
1 0.01 10 FALSE 9.40
4 0.01 15 FALSE 14.31
7 0.01 20 FALSE 17.78
10 0.01 25 FALSE 21.40
13 0.01 30 FALSE 25.62
16 0.01 50 FALSE 39.65
19 0.01 100 FALSE 63.67
22 0.01 150 FALSE 77.94
25 0.01 200 FALSE 86.47
2 0.10 10 FALSE 64.25
5 0.10 15 FALSE 78.89
8 0.10 20 FALSE 87.26
11 0.10 25 FALSE 92.10
14 0.10 30 FALSE 81.34
17 0.10 50 FALSE 88.14
20 0.10 100 FALSE 93.28
23 0.10 150 FALSE 92.79
26 0.10 200 FALSE 92.69
3 0.80 10 FALSE 88.26
6 0.80 15 FALSE 81.33
9 0.80 20 FALSE 91.88
12 0.80 25 FALSE 88.38
15 0.80 30 FALSE 94.67
18 0.80 50 FALSE 93.44
21 0.80 100 FALSE 92.96
24 0.80 150 FALSE 94.48
27 0.80 200 FALSE 93.98
>
The Monte Carlo simulations illustrate that Wald confidence intervals provide poor coverage at a p-value of 0.01, even with a sample size of 200. Coverage improves at p-value of 0.10, where all but one of the simulations at sample sizes 25 and above exceeded 90%.蒙特卡洛模拟表明,即使样本量为 200,Wald 置信区间在 p 值为 0.01 时提供较差的覆盖率。覆盖率在 p 值为 0.10 时提高,在样本量为 25 及以上的模拟中,除了一个模拟之外的所有模拟超过 90%。 Coverage is even better for the p-value of 0.80, where all but one of the sample sizes above 15 exceeded 91% coverage.
对于 0.80 的 p 值,覆盖率甚至更好,其中除了一个样本大小超过 15 之外,所有样本量都超过了 91% 的覆盖率。
Coverage improves further when we calculate Adjusted Wald confidence intervals, especially at lower p-values.当我们计算调整后的 Wald 置信区间时,覆盖率会进一步提高,尤其是在 p 值较低的情况下。
results[order(results$p_value,results$sample_size),]
p_value sample_size adjWald coverage_pct
1 0.01 10 TRUE 99.75
4 0.01 15 TRUE 98.82
7 0.01 20 TRUE 98.30
10 0.01 25 TRUE 97.72
13 0.01 30 TRUE 99.71
16 0.01 50 TRUE 98.48
19 0.01 100 TRUE 98.25
22 0.01 150 TRUE 98.05
25 0.01 200 TRUE 98.34
2 0.10 10 TRUE 93.33
5 0.10 15 TRUE 94.53
8 0.10 20 TRUE 95.61
11 0.10 25 TRUE 96.72
14 0.10 30 TRUE 96.96
17 0.10 50 TRUE 97.28
20 0.10 100 TRUE 95.06
23 0.10 150 TRUE 96.15
26 0.10 200 TRUE 95.44
3 0.80 10 TRUE 97.06
6 0.80 15 TRUE 98.10
9 0.80 20 TRUE 95.57
12 0.80 25 TRUE 94.88
15 0.80 30 TRUE 96.31
18 0.80 50 TRUE 95.05
21 0.80 100 TRUE 95.37
24 0.80 150 TRUE 94.62
27 0.80 200 TRUE 95.96
The Adjusted Wald confidence intervals provide consistently better coverage across the range of p-values and sample sizes, with an average coverage of 96.72% across the 27 simulations.调整后的 Wald 置信区间在 p 值和样本大小的范围内始终提供更好的覆盖率,在 27 次模拟中平均覆盖率为 96.72%。 This is consistent with the literature that indicates Adjusted Wald confidence intervals are more conservative than unadjusted Wald confidence intervals.
这与表明调整后的 Wald 置信区间比未调整的 Wald 置信区间更保守的文献一致。
At this point we have a working Monte Carlo simulator that produces valid results for multiple p-values and sample sizes.在这一点上,我们有一个有效的蒙特卡罗模拟器,它可以为多个 p 值和样本大小产生有效的结果。 We can now review the code to find opportunities to optimize its performance.
我们现在可以查看代码以寻找优化其性能的机会。
Following the old programming aphorism of Make it work, make it right, make it fast , working the solution out in an iterative manner helped enabled me to develop a solution that produces valid results.遵循Make it work, make it right, make it fast的旧编程格言,以迭代的方式解决方案帮助我开发出产生有效结果的解决方案。
Understanding of how to make it right enabled me not only to see the flaw in the code posted in the question, but it also enabled me to envision a solution.了解如何使它正确不仅使我能够看到问题中发布的代码中的缺陷,而且还使我能够设想解决方案。 That solution, using
rbinom()
once with an argument of m * n
, casting the result as a matrix()
, and then using rowSums()
to calculate p-values, led me to see how I could optimize my own solution by eliminating thousands of rbinom()
calls from each simulation.该解决方案使用
rbinom()
一次,参数为m * n
,将结果转换为matrix()
,然后使用rowSums()
计算 p 值,这让我看到了如何通过消除优化自己的解决方案来自每个模拟的数千个rbinom()
调用。
We create a function, binomialSimulation()
, that generates Bernoulli trials and Wald confidence intervals with a single call to rbinom()
, regardless of the number of trials in a single simulation.我们创建了一个 function,
binomialSimulation()
,它通过一次调用rbinom()
生成伯努利试验和 Wald 置信区间,而不管单个模拟中的试验次数如何。 We also aggregate results so each simulation generates a data frame containing one row describing the results of the test.我们还汇总了结果,因此每次模拟都会生成一个数据框,其中包含一行描述测试结果的行。
set.seed(90125)
binomialSimulation <- function(trial_size,p_value,sample_size,z_value){
trials <- matrix(rbinom(trial_size * sample_size,size=1,prob = p_value),
nrow = trial_size,ncol = sample_size)
observed_p <- rowSums(trials) / sample_size
lower.Wald <- observed_p - z_value * sqrt(observed_p*(1-observed_p)/sample_size)
upper.Wald <- observed_p + z_value * sqrt(observed_p*(1-observed_p)/sample_size)
coverage_pct <- sum(p_value > lower.Wald &
p_value < upper.Wald) / 10000 * 100
data.frame(sample_size,p_value,avg_observed_p=mean(observed_p),coverage_pct)
}
We run the function with a population p-value of 0.5, a sample size of 5, and 10,000 trials and a confidence interval of 95%, and track the execution time with system.time()
.我们运行 function,总体 p 值为 0.5,样本量为 5,试验次数为 10,000 次,置信区间为 95%,并使用
system.time()
跟踪执行时间。 The optimized function is 99.8% faster than the original implementation described earlier in the article, which runs in about 6.09 seconds.经过优化的 function 比本文前面描述的原始实现快 99.8%,后者运行时间约为 6.09 秒。
system.time(binomialSimulation(10000,0.5,5,1.96))
> system.time(binomialSimulation(10000,0.5,5,1.96))
user system elapsed
0.015 0.000 0.015
We will skip the intermediate steps and present the optimized version of the iteratively developed solution.我们将跳过中间步骤并展示迭代开发解决方案的优化版本。
system.time(results <- do.call(rbind,lapply(c(5,10,15,20,25,50,100,250),
function(aSample_size,p_values) {
do.call(rbind,lapply(p_values,function(a,b,c,d){
binomialSimulation(p_value = a,
trial_size = b,
sample_size = aSample_size,
z_value = d)
},10000,5,1.96))
},c(0.1,0.4,0.8))))
As expected, elimination of the thousands of unnecessary calls to rbinom()
radically improves performance of the solution.正如预期的那样,消除对
rbinom()
的数千个不必要的调用从根本上提高了解决方案的性能。
user system elapsed
0.777 0.053 0.830
Given that our prior solution ran in 217 seconds, performance of the optimized version is really impressive.鉴于我们之前的解决方案在 217 秒内运行,优化版本的性能确实令人印象深刻。 Now we have a solution that not only generates accurate Monte Carlo simulations of Bernoulli trials, but it's also fast.
现在我们有了一个解决方案,它不仅可以生成准确的伯努利试验蒙特卡罗模拟,而且速度也很快。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.