简体繁体 English

通过自举的置信区间

[英]Confidence intervals via bootstrapping

原文 2018-08-30 20:23:15 5 1 python/ statistics-bootstrap

Yesterday I began to read about using bootstrapping to determine confidence intervals (CIs) in many situations.昨天我开始阅读有关在许多情况下使用自举法确定置信区间 (CI) 的内容。 My current situation is that I am trying to estimate three parameters in a model via maximum likelihood estimation (MLE).我目前的情况是我试图通过最大似然估计（MLE）来估计模型中的三个参数。 This I have done, and now I need to define my CIs.这我已经完成了，现在我需要定义我的 CI。 This can obviously be done via profile likelihood, but bootstrapping will give a more broad CI interval as far as I can read.这显然可以通过配置文件可能性来完成，但就我所知，引导将提供更广泛的 CI 间隔。 My problem is that I am unsure on how to actually perform bootstrapping ?我的问题是我不确定如何实际执行引导？ I have written my own code for the parameter estimation, so no build-in MLE calculators.我已经为参数估计编写了自己的代码，因此没有内置 MLE 计算器。

Basically the observed data I have is binary data, so 1 or 0. And it's from those data (put into a model with three parameters) that I have tried to estimate the parameter values.基本上，我拥有的观察数据是二进制数据，所以是 1 或 0。我试图从这些数据（放入具有三个参数的模型中）中估计参数值。

So let's say my cohort is 500, is the idea then that I take a sample from my cohort, maybe 100, and then expand it to 500 again by just multiplying the sample 5 times, and run the simulation once again, which in turn should result in some new parameter estimates, and then just do this 1000-2000 times in order to get a series of parameter values, which can then be used to define the CI ?假设我的队列是 500，那么我的想法是从我的队列中抽取一个样本，可能是 100，然后通过将样本乘以 5 次再次将其扩展到 500，然后再次运行模拟，这反过来应该导致一些新的参数估计，然后只执行 1000-2000 次以获得一系列参数值，然后可以用来定义 CI？

Or am I missing something here ?还是我在这里遗漏了什么？

1 个解决方案

This question isn't related to Python.这个问题与 Python 无关。 I think you need to read an intro to bootstrapping.我认为您需要阅读引导程序的介绍。 "An Introduciton to Statistical Learning" provides a good one. “统计学习简介”提供了一个很好的方法。 The idea is not to sample 100 -- you must sample with replacement and taking the same sample size (500).这个想法不是抽样 100 - 您必须进行替换抽样并采用相同的样本量（500）。 Yes, then you reestimate your parameter many times.是的，然后您多次重新估计您的参数。 And then there's several ways of taking all of these estimates and turning them into a confidence interval.然后有几种方法可以将所有这些估计值转换为置信区间。 For example, you can use them to estimate the standard error (the standard deviation of the sampling distribution), and then use +/- 2*se.例如，您可以使用它们来估计标准误差（抽样分布的标准偏差），然后使用 +/- 2*se。