你如何计算 Python 中 Pearson's r 的置信区间？

Question

In Python, I know how to calculate r and associated p-value using scipy.stats.pearsonr , but I'm unable to find a way to calculate the confidence interval of r.在 Python 中，我知道如何使用scipy.stats.pearsonr计算 r 和相关的 p 值，但我无法找到计算 r 置信区间的方法。 How is this done?这是怎么做到的？ Thanks for any help :)谢谢你的帮助：）

Answer 1

According to [1], calculation of confidence interval directly with Pearson r is complicated due to the fact that it is not normally distributed.根据 [1]，直接用 Pearson r 计算置信区间是复杂的，因为它不是正态分布的。 The following steps are needed:需要以下步骤：

Convert r to z',将 r 转换为 z',
Calculate the z' confidence interval.计算 z' 置信区间。 The sampling distribution of z' is approximately normally distributed and has standard error of 1/sqrt(n-3). z' 的抽样分布近似正态分布，标准误差为 1/sqrt(n-3)。
Convert the confidence interval back to r.将置信区间转换回 r。

Here are some sample codes:以下是一些示例代码：

def r_to_z(r):
    return math.log((1 + r) / (1 - r)) / 2.0

def z_to_r(z):
    e = math.exp(2 * z)
    return((e - 1) / (e + 1))

def r_confidence_interval(r, alpha, n):
    z = r_to_z(r)
    se = 1.0 / math.sqrt(n - 3)
    z_crit = stats.norm.ppf(1 - alpha/2)  # 2-tailed z critical value

    lo = z - z_crit * se
    hi = z + z_crit * se

    # Return a sequence
    return (z_to_r(lo), z_to_r(hi))

Reference:参考：

http://onlinestatbook.com/2/estimation/correlation_ci.html http://onlinestatbook.com/2/estimation/correlation_ci.html

Answer 2

Using rpy2 and the psychometric library (you will need R installed and to run install.packages("psychometric") within R first)使用 rpy2 和心理测量库（您将需要安装 R 并首先在 R 中运行 install.packages("psychometric")）

from rpy2.robjects.packages import importr
psychometric=importr('psychometric')
psychometric.CIr(r=.9, n = 100, level = .95)

Where 0.9 is your correlation, n the sample size and 0.95 the confidence level其中 0.9 是您的相关性，n 是样本量，0.95 是置信水平

Answer 3

Here's a solution that uses bootstrapping to compute the confidence interval, rather than the Fisher transformation (which assumes bivariate normality, etc.), borrowing from this answer :这是一个使用自举来计算置信区间的解决方案，而不是 Fisher 变换（假设二元正态性等），借用了这个答案：

import numpy as np


def pearsonr_ci(x, y, ci=95, n_boots=10000):
    x = np.asarray(x)
    y = np.asarray(y)
    
   # (n_boots, n_observations) paired arrays
    rand_ixs = np.random.randint(0, x.shape[0], size=(n_boots, x.shape[0]))
    x_boots = x[rand_ixs]
    y_boots = y[rand_ixs]
    
    # differences from mean
    x_mdiffs = x_boots - x_boots.mean(axis=1)[:, None]
    y_mdiffs = y_boots - y_boots.mean(axis=1)[:, None]
    
    # sums of squares
    x_ss = np.einsum('ij, ij -> i', x_mdiffs, x_mdiffs)
    y_ss = np.einsum('ij, ij -> i', y_mdiffs, y_mdiffs)
    
    # pearson correlations
    r_boots = np.einsum('ij, ij -> i', x_mdiffs, y_mdiffs) / np.sqrt(x_ss * y_ss)
    
    # upper and lower bounds for confidence interval
    ci_low = np.percentile(r_boots, (100 - ci) / 2)
    ci_high = np.percentile(r_boots, (ci + 100) / 2)
    return ci_low, ci_high

Answer 4

Answer given by bennylp is mostly correct, however, there is a small error in calculating the critical value in the 3rd function. bennylp给出的答案大部分是正确的，但是在计算第三个函数的临界值时有一个小错误。

It should instead be:它应该是：

def r_confidence_interval(r, alpha, n):
    z = r_to_z(r)
    se = 1.0 / math.sqrt(n - 3)
    z_crit = stats.norm.ppf((1 + alpha)/2)  # 2-tailed z critical value

    lo = z - z_crit * se
    hi = z + z_crit * se

    # Return a sequence
    return (z_to_r(lo), z_to_r(hi))

Here's another post for reference: Scipy - two tail ppf function for az value?这是另一篇文章供参考： Scipy - az 值的两个尾部 ppf 函数？

Answer 5

I know bootstrapping has been suggested above, proposing another variation of it below, which may suit some other set ups better.我知道上面已经建议了引导，在下面提出了它的另一种变体，这可能更适合其他一些设置。

#1 Sample your data (paired X & Ys and can also add other say weight) , fit original model on it, record r2, append it. #1对您的数据进行采样（配对 X 和 Y，也可以添加其他重量），在其上拟合原始模型，记录 r2，附加它。 Then extract your confidence intervals from your distribution of all R2s recorded.然后从记录的所有 R2 分布中提取置信区间。

#2 Additionally can fit on sampled data and using sampled data model predict on non sampled X (could also supply a continuous range to extend your predictions instead of using original X) to get confidence intervals on your Y hats. #2另外可以拟合采样数据并使用采样数据模型预测非采样 X （也可以提供连续范围来扩展您的预测而不是使用原始 X）来获得 Y 帽子的置信区间。

So in sample code:所以在示例代码中：

import numpy as np
from scipy.optimize import curve_fit
import pandas as pd
from sklearn.metrics import r2_score


x = np.array([your numbers here])
y = np.array([your numbers here])


### define list for R2 values
r2s = []

### define dataframe to append your bootstrapped fits for Y hat ranges
ci_df = pd.DataFrame({'x': x})

### define how many samples you want
how_many_straps = 5000

### define your fit function/s
def func_exponential(x,a,b):
    return np.exp(b) * np.exp(a * x)

### fit original, using log because fitting exponential
polyfit_original = np.polyfit(x
                              ,np.log(y)
                              ,1
                              ,# w= could supply weight for observations here)
                              )

for i in range(how_many_straps+1):

    ### zip into tuples attaching X to Y, can combine more variables as well
    zipped_for_boot = pd.Series(tuple(zip(x,y)))

    ### sample zipped X & Y pairs above with replacement
    zipped_resampled = zipped_for_boot.sample(frac=1, 
                                              replace=True)

    ### creater your sampled X & Y 
    boot_x = []
    boot_y = []
    
    for sample in zipped_resampled:
        boot_x.append(sample[0])
        boot_y.append(sample[1])
     
    ### predict sampled using original fit
    y_hat_boot_via_original_fit = func_exponential(np.asarray(boot_x),
                                                   polyfit_original[0], 
                                                   polyfit_original[1])       
    
    ### calculate r2 and append
    r2s.append(r2_score(boot_y,  y_hat_boot_via_original_fit))
    
    
    ### fit sampled
    polyfit_boot = np.polyfit(boot_x
                              ,np.log(boot_y)
                              ,1
                              ,# w= could supply weight for observations here)
                              )

        
    ### predict original via sampled fit or on a range of min(x) to Z
    y_hat_original_via_sampled_fit = func_exponential(x,
                                                      polyfit_boot[0], 
                                                      polyfit_boot[1])     
    

    ### insert y hat into dataframe for calculating y hat confidence intervals
    ci_df["trial_" + str(i)] = y_hat_original_via_sampled_fit
  

### R2 conf interval
low = round(pd.Series(r2s).quantile([0.025, 0.975]).tolist()[0],3)
up = round(pd.Series(r2s).quantile([0.025, 0.975]).tolist()[1],3)
F"r2 confidence interval = {low} - {up}"

你如何计算 Python 中 Pearson's r 的置信区间？

问题描述

5 个解决方案

解决方案1
9 已采纳 2017-09-08 03:35:14

解决方案2
2 2017-01-24 20:54:52

解决方案3
1 2020-10-14 21:55:19

解决方案4
0 2020-01-16 23:51:01

解决方案5
0 2021-12-17 11:54:50

你如何计算 Python 中 Pearson&#39;s r 的置信区间？

问题描述

5 个解决方案

解决方案1 9 已采纳 2017-09-08 03:35:14

解决方案2 2 2017-01-24 20:54:52

解决方案3 1 2020-10-14 21:55:19

解决方案4 0 2020-01-16 23:51:01

解决方案5 0 2021-12-17 11:54:50

你如何计算 Python 中 Pearson's r 的置信区间？

解决方案1
9 已采纳 2017-09-08 03:35:14

解决方案2
2 2017-01-24 20:54:52

解决方案3
1 2020-10-14 21:55:19

解决方案4
0 2020-01-16 23:51:01

解决方案5
0 2021-12-17 11:54:50