简体   繁体   English

在Python中绘制回归线,置信区间和预测区间

[英]Drawing regression line, confidence interval, and prediction interval in Python

I'm new to the regression game and hope to plot a functionally arbitrary, nonlinear regression line (plus confidence and prediction intervals) for a subset of data that satisfies a certain condition (ie with mean replicate value exceeding a threshold; see below). 我是回归游戏的新手,希望为满足特定条件的数据子集绘制功能上任意的非线性回归线(加上置信度和预测区间)(即平均重复值超过阈值;见下文)。

The data is generated for independent variable x across 20 different values: x=(20-np.arange(20))**2 , with rep_num=10 replicates for each condition. 在20个不同的值上为独立变量x生成datax=(20-np.arange(20))**2 ,每个条件的rep_num=10重复。 The data shows strong nonlinearity across x and looks like the following: 数据显示x强非线性,如下所示:

import numpy as np

mu = [.40, .38, .39, .35, .37, .33, .34, .28, .11, .24,
      .03, .07, .01, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]     

data = np.zeros((20, rep_num))
for i in range(13):
    data[i] = np.clip(np.random.normal(loc=mu[i], scale=0.1, size=rep_num), 0., 1.)

I can make a scatter plot of the data; 我可以制作数据的散点图; the replicate means are shown by the red dots: 重复方式由红点显示:

import matplotlib.pyplot as plt

plt.scatter(np.log10(np.tile(x[:,None], rep_num)), data, 
            facecolors='none', edgecolors='k', alpha=0.25)
plt.plot(np.log10(x), data.mean(1), 'ro', alpha=0.8)
plt.plot(np.log10(x), np.repeat(0., 20), 'k--')
plt.xlim(-0.02, np.max(np.log10(x)) + 0.02)
plt.ylim(-0.01, 0.7)

散点图

My goal is to plot a regression line for only those data that have replicate mean > 0.02. 我的目标是仅为那些复制均值> 0.02的数据绘制回归线。 In addition, I would like to add a 95% confidence interval (black dashed lines) around the regression, as well as a 95% prediction interval (blue dashed lines) -- ideally, the prediction interval can also be colored in with transparent blue background. 另外,我想在回归周围添加95%的置信区间(黑色虚线),以及95%的预测间隔(蓝色虚线) - 理想情况下,预测间隔也可以用透明蓝色着色背景。

The final plot (without the blue background inside the prediction interval) would look something like this: 最终的情节(预测间隔内没有蓝色背景)看起来像这样:

在此输入图像描述

How would I make this? 我该怎么做? My online search yielded very different partial approaches using seaborn, scipy, and statsmodels. 我的在线搜索使用seaborn,scipy和statsmodels产生了截然不同的部分方法。 The applications of some of those template functions did not appear to work alongside the existing matplotlib scatter plot. 其中一些模板函数的应用似乎与现有的matplotlib散点图并不一致。

OK, here's a shot at this (withouth prediction band, though). 好的,这是一个镜头(尽管没有预测乐队)。 First of all you want to select the applicable data: 首先,您要选择适用的数据:

threshold = 0.02
reg_x = np.log10(x)[data.mean(1)>threshold]
reg_y = data.mean(1)[data.mean(1)>threshold]

Then you choose a model and perform a fit. 然后你选择一个模型并进行拟合。 Note, here I chose a second order polynomial but in principle you could do anything. 注意,这里我选择了二阶多项式,但原则上你可以做任何事情。 For the fits I use kapteyn , this has a built-in confidence bans method, although it would be straightforward to implement (see eg Delta method ) 对于我使用kapteyn的拟合,这有一个内置的置信禁止方法,虽然它可以直接实现(参见例如Delta方法

from kapteyn import kmpfit

# Set model to fit.
def model(p, x):
    a, b, c = p
    return a + b*x + c*x**2

# Perform fit.
f = kmpfit.simplefit(model, [.1, .1, .1], reg_x, reg_y)

f contains all the estimated parameters and such, you can use that for plotting etc. f包含所有估计参数等,您可以将其用于绘图等。

x = np.linspace(0, 3, 100)
plt.plot(x, model(f.params, x), linestyle='-', color='black', marker='')

For the confidence bands, we need the partial derivatives of the model with respect to the parameters (yes, some math). 对于置信带,我们需要关于参数的模型的偏导数(是的,一些数学)。 Again, this is easy for a polynomial model, shouldn't be a problem for any other model either. 同样,这对于多项式模型来说很容易,对于任何其他模型也不应该是一个问题。

# Partial derivatives:
dfdp = [1., reg_x, reg_x**2]
_, ci_upper, ci_lower = f.confidence_band(reg_x, dfdp, 0.95, model)

# Plot.
plt.plot(reg_x, ci_upper, linestyle='--', color='black', marker='')
plt.plot(reg_x, ci_lower, linestyle='--', color='black', marker='')

Unfortunately there is not prediction_bands() routine in the package, at least not that I know of. 不幸的是,包中没有prediction_bands()例程,至少不是我所知道的。 Assume you found some method for the prediction band, the plotting and preparation would look the same though.. 假设您找到了一些预测带的方法,绘图和准备看起来会相同。

p_upper, p_lower = prediction_band(*args, **kwargs)
plt.fill_between(reg_x, p_upper, p_lower, facecolor='blue', alpha=0.2, linestyle='')

Hope this helps, L. 希望这有帮助,L。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM