简体   繁体   English

Python中两个比例之差的置信区间

[英]Confidence interval for the difference between two proportions in Python

For example, in an AB test the A population could have 1000 data points, of which 100 are successes.例如,在 AB 测试中,A 群体可能有 1000 个数据点,其中 100 个是成功的。 While B could have 2000 data points and 220 successes.而 B 可能有 2000 个数据点和 220 个成功。 This gives A a success proportion of 0.1 and B 0.11, the delta of which is 0.01.这使 A 的成功比例为 0.1,B 为 0.11,其 delta 为 0.01。 How can I calculate this confidence interval around this delta in python?如何在python中围绕这个delta计算这个置信区间?

Stats models can do this for one sample, but seemingly does not have a package to deal with the difference between two samples as is necessary for an AB test.统计模型可以对一个样本执行此操作,但似乎没有一个包来处理 AB 测试所必需的两个样本之间的差异。 ( http://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html ) ( http://www.statsmodels.org/dev/generated/statsmodels.stats.proportion.proportion_confint.html )

I couldn't find a function for this from Statsmodels.我无法从 Statsmodels 中找到此功能。 However, this website goes over the maths for generating the confidence interval as well as being the source of the below function:但是, 该网站详细介绍了用于生成置信区间的数学方法以及以下函数的来源:

def two_proprotions_confint(success_a, size_a, success_b, size_b, significance = 0.05):
    """
    A/B test for two proportions;
    given a success a trial size of group A and B compute
    its confidence interval;
    resulting confidence interval matches R's prop.test function

    Parameters
    ----------
    success_a, success_b : int
        Number of successes in each group

    size_a, size_b : int
        Size, or number of observations in each group

    significance : float, default 0.05
        Often denoted as alpha. Governs the chance of a false positive.
        A significance level of 0.05 means that there is a 5% chance of
        a false positive. In other words, our confidence level is
        1 - 0.05 = 0.95

    Returns
    -------
    prop_diff : float
        Difference between the two proportion

    confint : 1d ndarray
        Confidence interval of the two proportion test
    """
    prop_a = success_a / size_a
    prop_b = success_b / size_b
    var = prop_a * (1 - prop_a) / size_a + prop_b * (1 - prop_b) / size_b
    se = np.sqrt(var)

    # z critical value
    confidence = 1 - significance
    z = stats.norm(loc = 0, scale = 1).ppf(confidence + significance / 2)

    # standard formula for the confidence interval
    # point-estimtate +- z * standard-error
    prop_diff = prop_b - prop_a
    confint = prop_diff + np.array([-1, 1]) * z * se
    return prop_diff, confint

The sample sizes don't have to be equal.样本大小不必相等。 The confidence interval for two proportions is两个比例的置信区间为在此处输入图片说明

p1 and p2 are the observed probabilities, computed over their respective samples n1 and n2. p1 和 p2 是观察到的概率,在它们各自的样本 n1 和 n2 上计算。

For more please see this white paper .有关更多信息,请参阅此白皮书

statsmodels 包现在有 confint_proportions_2indep,它获得比较两个比例的置信区间,您可以在文档https://www.statsmodels.org/stable/generated/statsmodels.stats.proportion.confint_proportions_2indep.html查看详细信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM