简体   繁体   English

从 Seaborn regplot 中提取均值和置信区间

[英]Extract mean and confidence intervals from Seaborn regplot

Given that regplot calculates means in intervals and bootstraps to find confidence intervals for each bin, it seems like a waste to have to recalculate them manually for further study, so:鉴于 regplot 计算间隔和引导程序的均值以找到每个 bin 的置信区间,因此必须手动重新计算它们以供进一步研究似乎是一种浪费,因此:

Question: How do I access the calculated means and confidence intervals of a regplot?问题:如何访问正则图的计算均值和置信区间?

Example: This code produces a nice plot of bin means with CIs:示例:此代码使用 CI 生成了一个很好的 bin 均值图:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# just some random numbers to get started
fig, ax = plt.subplots()
x = np.random.uniform(-2, 2, 1000)
y = np.random.normal(x**2, np.abs(x) + 1)

# Manual binning to retain control
binwidth=4./10
x_bins=np.arange(-2+binwidth/2,2,binwidth)
sns.regplot(x=x, y=y, x_bins=x_bins, fit_reg=None)
plt.show()

Result: Regplot showing binned data w.结果: Regplot 显示分箱数据 w。 CIs配置项

Not that calculating the means bin by bin isn't easily doable, but the CIs are calculated using random numbers.并不是说逐个计算均值不容易,而是使用随机数计算 CI。 It would be nice to have the exact same numbers accessible as are plotted, so how do I access them?拥有与绘制的完全相同的数字会很好,那么我如何访问它们? There must be some sort of get_*-method I'm overlooking.一定有某种 get_*-method 我忽略了。

Set-up设置

Setting up as in your MWE:在您的 MWE 中设置:

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Random numbers for plotting
x = np.random.uniform(-2, 2, 1000)
y = np.random.normal(x**2, np.abs(x) + 1)

# Manual binning to retain control
binwidth = 4 / 10
x_bins = np.arange(binwidth/2 - 2, 2, binwidth)
sns.regplot(x=x, y=y, x_bins=x_bins, fit_reg=None)

This gives our starting point as:这给出了我们的起点: OP的MWE

Extracting the Confidence Intervals提取置信区间

We can extract the confidence intervals by looping over the plotted lines and extracting the miniumum and maximum values (corresponding to the upper and lower CIs respectively):我们可以通过循环绘制的线并提取最小值和最大值(分别对应于上限和下限 CI)来提取置信区间:

ax = plt.gca()
lower = [line.get_ydata().min() for line in ax.lines]
upper = [line.get_ydata().max() for line in ax.lines]

As a sanity check we can plot these extracted points on top of our original data (shown here by red crosses):作为完整性检查,我们可以将这些提取的点绘制在我们的原始数据之上(此处用红色十字表示):

plt.scatter(x_bins, lower, marker='x', color='C3', zorder=3)
plt.scatter(x_bins, upper, marker='x', color='C3', zorder=3)

带有 CI 的 MWE

Extracting the Means提取均值

The values of the means can be extracted from ax.collections as:均值的值可以从ax.collections提取为:

means = ax.collections[0].get_offsets()[:, 1]

Again, as a sanity check we can overlay our extracted values on the original plot:同样,作为完整性检查,我们可以将提取的值叠加在原始图上:

plt.scatter(x_bins, means, color='C1', marker='x', zorder=3)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM