使用 PyTorch 计算 95% 置信区间以进行分类和回归的正确方法是什么？

Question

I wanted to report 90, 95, 99, etc. confidence intervals on my data using PyTorch.我想使用 PyTorch 报告我的数据的 90、95、99 等置信区间。 But confidence intervals seems too important to leave my implementation untested or criticized so I wanted feedback - should be checked by at least some expert.但是置信区间似乎太重要了，不能让我的实现未经测试或受到批评，所以我想要反馈 - 至少应该由一些专家检查。 Furthermore, I already noticed I got NaN values when my values when negative which make me think my code only works for classification (at the very least) but I also do regression.此外，我已经注意到当我的值为负时我得到了 NaN 值，这让我认为我的代码只适用于分类（至少），但我也做回归。 I am also surprised that using the numpy code directly actually gave me differentiable tensors...not something I was expecting.我也很惊讶，直接使用 numpy 代码实际上给了我可微的张量......不是我所期待的。

So is this correct?:那么这是正确的吗？：

import numpy as np
import scipy
import torch
from torch import Tensor

P_CI = {0.90: 1.64,
        0.95: 1.96,
        0.98: 2.33,
        0.99: 2.58,
        }


def mean_confidence_interval_rfs(data, confidence=0.95):
    """
    https://stackoverflow.com/a/15034143/1601580
    """
    a = 1.0 * np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * scipy.stats.t.ppf((1 + confidence) / 2., n - 1)
    return m, h


def mean_confidence_interval(data, confidence=0.95):
    a = 1.0 * np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * scipy.stats.t.ppf((1 + confidence) / 2., n - 1)
    return m, m - h, m + h


def ci(a, p=0.95):
    import numpy as np, scipy.stats as st
    st.t.interval(p, len(a) - 1, loc=np.mean(a), scale=st.sem(a))


# def ci(a, p=0.95):
#     import statsmodels.stats.api as sms
#
#     sms.DescrStatsW(a).tconfint_mean()

def compute_confidence_interval_classification(data: Tensor,
                                               by_pass_30_data_points: bool = False,
                                               p_confidence: float = 0.95
                                               ) -> Tensor:
    """
    Computes CI interval
        [B] -> [1]
    According to [1] CI the confidence interval for classification error can be calculated as follows:
        error +/- const * sqrt( (error * (1 - error)) / n)

    The values for const are provided from statistics, and common values used are:
        1.64 (90%)
        1.96 (95%)
        2.33 (98%)
        2.58 (99%)
    Assumptions:
    Use of these confidence intervals makes some assumptions that you need to ensure you can meet. They are:

    Observations in the validation data set were drawn from the domain independently (e.g. they are independent and
    identically distributed).
    At least 30 observations were used to evaluate the model.
    This is based on some statistics of sampling theory that takes calculating the error of a classifier as a binomial
    distribution, that we have sufficient observations to approximate a normal distribution for the binomial
    distribution, and that via the central limit theorem that the more observations we classify, the closer we will get
    to the true, but unknown, model skill.

    Ref:
        - computed according to: https://machinelearningmastery.com/report-classifier-performance-confidence-intervals/

    todo:
        - how does it change for other types of losses
    """
    B: int = data.size(0)
    # assert data >= 0
    assert B >= 30 and (not by_pass_30_data_points), f' Not enough data for CI calc to be valid and approximate a' \
                                                     f'normal, you have: {B=} but needed 30.'
    const: float = P_CI[p_confidence]
    error: Tensor = data.mean()
    val = torch.sqrt((error * (1 - error)) / B)
    print(val)
    ci_interval: float = const * val
    return ci_interval


def compute_confidence_interval_regression():
    """
    todo
    :return:
    """
    raise NotImplementedError


# - tests

def ci_test():
    x: Tensor = abs(torch.randn(35))
    ci_pytorch = compute_confidence_interval_classification(x)
    ci_rfs = mean_confidence_interval(x)
    print(f'{x.var()=}')
    print(f'{ci_pytorch=}')
    print(f'{ci_rfs=}')

    x: Tensor = abs(torch.randn(35, requires_grad=True))
    ci_pytorch = compute_confidence_interval_classification(x)
    ci_rfs = mean_confidence_interval(x)
    print(f'{x.var()=}')
    print(f'{ci_pytorch=}')
    print(f'{ci_rfs=}')

    x: Tensor = torch.randn(35) - 10
    ci_pytorch = compute_confidence_interval_classification(x)
    ci_rfs = mean_confidence_interval(x)
    print(f'{x.var()=}')
    print(f'{ci_pytorch=}')
    print(f'{ci_rfs=}')


if __name__ == '__main__':
    ci_test()
    print('Done, success! \a')

output: output：

tensor(0.0758)
x.var()=tensor(0.3983)
ci_pytorch=tensor(0.1486)
ci_rfs=(tensor(0.8259), tensor(0.5654), tensor(1.0864))
tensor(0.0796, grad_fn=<SqrtBackward>)
x.var()=tensor(0.4391, grad_fn=<VarBackward>)
ci_pytorch=tensor(0.1559, grad_fn=<MulBackward0>)
Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/pydevd.py", line 1483, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/metrics/metrics.py", line 154, in <module>
    ci_test()
  File "/Users/brandomiranda/ultimate-utils/ultimate-utils-proj-src/uutils/torch_uu/metrics/metrics.py", line 144, in ci_test
    ci_pytorch = compute_confidence_interval_classification(x, by_pass_30_data_points)

how do I fix the code above for regression eg negative values of arbitrary magnitude?如何修复上面的代码以进行回归，例如任意幅度的负值？

Sort of surprised there isn't an implementation already and especially not an official PyTorch one, given how important CI is supposed to be...perhaps a deep learning bad habit?有点惊讶，还没有实现，尤其是官方的 PyTorch，考虑到 CI 应该是多么重要……也许是深度学习的坏习惯？ Rarely seen it in papers, unfortunately.不幸的是，很少在论文中看到它。

References:参考：

Answer 1

tldr; tldr;

Confidence intervals (ci) compute:置信区间 (ci) 计算：

the probability that the true mean is in the given interval (usually written mu_n +- ci真实均值在给定区间内的概率（通常写成mu_n +- ci

Assumptions:假设：

traditional confidence intervals statements only hold for statements about the value (parameter, random quantitiy, etc) we want to estimate being the mean传统的置信区间陈述仅适用于关于我们想要估计的值（参数、随机数量等）的陈述是平均值
you have enough samples so that the analysis holds (eg the mean $mu_n = 1/n sum_i x_i$, where n>=30 is recommended)您有足够的样本以便分析成立（例如，平均值 $mu_n = 1/n sum_i x_i$，建议使用n>=30 ）

If those assumptions hold (**ie your esitmating the true mean via the sample mean with a +- value **) then use the code bellow that I provided called torch_compute_confidence_interval for regression, classification, anything you want.如果这些假设成立（**即您通过具有 +- 值的样本均值估算真实均值**），则使用我提供的名为torch_compute_confidence_interval的代码进行回归、分类以及您想要的任何操作。

First, asfaik confidence intervals (ci) is an open research problem in deep learning (DL) - so more sophisticated answers probably exist.首先，asfaik 置信区间 (ci) 是深度学习 (DL) 中的一个开放研究问题——因此可能存在更复杂的答案。 But I will provide a practical answer that I plan to use (and see others using when reporting results in DL).但我将提供一个我计划使用的实用答案（并在 DL 中报告结果时看到其他人使用）。

To compute confidence intervals we have to understand a little bit of ci first.要计算置信区间，我们必须先了解一点 ci。 What they are is a probabilistic statement over the random surveys/samples of data sets that the mean you are trying to report is withing the reported interval.它们是对随机调查/数据集样本的概率陈述，表明您尝试报告的平均值在报告的区间内。 So when people say:所以当人们说：

mean_error +- CI for p=95%

it means if you sampled 95 data sets you'd expect the true mean to lie in that interval 95 of the time (but you wouldn't know which ones, so you can't say for any specific interval you compute that the mean will be there).这意味着如果您对 95 个数据集进行采样，您会期望真正的平均值在 95 个时间间隔内（但您不知道是哪一个，因此您不能说对于任何特定的时间间隔，您计算出的平均值会在那里）。

This means you can only use it for reporting means .这意味着您只能将其用于报告方式。 This is because the maths that goes behind it (which isn't very hard) approximates the computation of the probability that the bound holds (or the confidence interval holds) by taking advantage that we can compute probabilities analytically for sample means because the approximate a normal according to the central limit theorem CLT.这是因为它背后的数学（这不是很难）通过利用我们可以分析地计算样本均值的概率来近似计算边界成立（或置信区间成立）的概率，因为近似 a根据中心极限定理 CLT 正常。 So the specific CI that is computed assumes the quanity you want to compute is a sample mean and computes your +- numbers using this normal approximation.因此，计算的特定 CI 假定您要计算的数量是样本均值，并使用此正态近似值计算您的 +- 数。 Thus, usually it's recomended to have n>=30 data points for the specific data set you are using but things can still work out nicely since ci can be computed with at distribution instead of a normal (denoted z in stats software).因此，通常建议您使用的特定数据集具有n>=30数据点，但事情仍然可以很好地解决，因为 ci 可以使用分布而不是正态分布（在统计软件中表示为 z）。

Given those assumptions you can simply do the following:鉴于这些假设，您可以简单地执行以下操作：

def torch_compute_confidence_interval(data: Tensor,
                                           confidence: float = 0.95
                                           ) -> Tensor:
    """
    Computes the confidence interval for a given survey of a data set.
    """
    n = len(data)
    mean: Tensor = data.mean()
    # se: Tensor = scipy.stats.sem(data)  # compute standard error
    # se, mean: Tensor = torch.std_mean(data, unbiased=True)  # compute standard error
    se: Tensor = data.std(unbiased=True) / (n**0.5)
    t_p: float = float(scipy.stats.t.ppf((1 + confidence) / 2., n - 1))
    ci = t_p * se
    return mean, ci

I've tested it and compared it to things specialized for classification and they agree in values up to 1e-2 so the code works.我已经对其进行了测试并将其与专门用于分类的事物进行了比较，它们的值一致，最高可达1e-2 ，因此代码可以正常工作。 Output: Output：

Connected to pydev debugger (build 213.5744.248)
x_bernoulli.std()=tensor(0.5040)
ci_95=0.1881992999915952
ci_95_cls=tensor(0.1850)
ci_95_anything=tensor(0.1882)
x_bernoulli.std()=tensor(0.5085, grad_fn=<StdBackward>)
ci_95_torch=tensor(0.1867, grad_fn=<MulBackward0>)
x.std()=tensor(0.9263)
ci_95=0.3458867459004733
ci_95_torch=tensor(0.3459)
x.std()=tensor(1.0181, grad_fn=<StdBackward>)
ci_95_torch=tensor(0.3802, grad_fn=<MulBackward0>)

For more details see my ultimate-utils library where I comment on the maths in the docs: https://github.com/brando90/ultimate-utils/blob/e81a8c3c4425b33e00b3ade172705f20b626b2b1/ultimate-utils-proj-src/uutils/torch_uu/metrics/confidence_intervals.py#L1有关更多详细信息，请参阅我的 Ultimate-utils 库，我在其中评论文档中的数学： https://github.com/brando90/ultimate-utils/blob/e81a8c3c4425b33e00b3ade172705f20b626b2b1/ultimate-utils-proj/metric-src/uutils-utils-proj/metric-src/uutils /confidence_intervals.py#L1

Comments on DL对 DL 的评论

If you are reporting the error of a specific model eg neural net, like this you are more or less reporting that the true mean error for that very specific neural net and weights lies in those bounds.如果您要报告特定 model 的错误，例如神经网络，那么您或多或少会报告该非常特定的神经网络和权重的真正平均误差在这些范围内。 But as I said this is an open research area so fancier things must be available eg consider some layers are actually random, etc.但正如我所说，这是一个开放的研究领域，所以必须有更好的东西可用，例如考虑某些层实际上是随机的，等等。

使用 PyTorch 计算 95% 置信区间以进行分类和回归的正确方法是什么？

问题描述

1 个解决方案

解决方案1
0 2021-12-17 17:40:06

使用 PyTorch 计算 95% 置信区间以进行分类和回归的正确方法是什么？

问题描述

1 个解决方案

解决方案1 0 2021-12-17 17:40:06

解决方案1
0 2021-12-17 17:40:06