在Python中计算并绘制95％的散点图数据范围

Question

我想知道，对于给定的预测通勤行程持续时间（分钟），我可能期望的实际通勤时间范围。 例如，如果谷歌地图预测我的通勤时间为20分钟，那么我应该期望的最小和最大通勤时间是多少（可能是95％的范围）？

让我们将数据导入熊猫：

%matplotlib inline
import pandas as pd

commutes = pd.read_csv('https://raw.githubusercontent.com/blokeley/commutes/master/commutes.csv')
commutes.tail()

这给出了：

我们可以轻松创建一个图，显示原始数据的分散，回归曲线和该曲线上的95％置信区间：

import seaborn as sns

# Create a linear model plot
sns.lmplot('prediction', 'duration', commutes);

我现在如何计算和绘制95％的实际通勤时间与预测时间的范围？

换句话说，如果谷歌地图预测我的通勤需要20分钟，看起来实际上可能需要14到28分钟之间的任何时间。 计算或绘制这个范围会很棒。

在此先感谢您的帮助。

Answer 1

通勤的实际持续时间与预测之间的关系应该是线性的，所以我可以使用分位数回归：

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import statsmodels.formula.api as smf

# Import data and print the last few rows
commutes = pd.read_csv('https://raw.githubusercontent.com/blokeley/commutes/master/commutes.csv')

# Create the quantile regression model
model = smf.quantreg('duration ~ prediction', commutes)

# Create a list of quantiles to calculate
quantiles = [0.05, 0.25, 0.50, 0.75, 0.95]

# Create a list of fits
fits = [model.fit(q=q) for q in quantiles]

# Create a new figure and axes
figure, axes = plt.subplots()

# Plot the scatter of data points
x = commutes['prediction']
axes.scatter(x, commutes['duration'], alpha=0.4)

# Create an array of predictions from the minimum to maximum to create the regression line
_x = np.linspace(x.min(), x.max())

for index, quantile in enumerate(quantiles):
    # Plot the quantile lines
    _y = fits[index].params['prediction'] * _x + fits[index].params['Intercept']
    axes.plot(_x, _y, label=quantile)

# Plot the line of perfect prediction
axes.plot(_x, _x, 'g--', label='Perfect prediction')
axes.legend()
axes.set_xlabel('Predicted duration (minutes)')
axes.set_ylabel('Actual duration (minutes)');

这给出了：

非常感谢我的同事菲利普的分位数回归提示。

Answer 2

您应该在3 sigma std dev中以高斯分布拟合您的数据，这将代表大约96％的结果。

照顾正态分布。

在Python中计算并绘制95％的散点图数据范围

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-03-17 09:15:12

解决方案2
-1 2017-03-01 17:56:36

在Python中计算并绘制95％的散点图数据范围

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-03-17 09:15:12

解决方案2 -1 2017-03-01 17:56:36

解决方案1
1 已采纳 2017-03-17 09:15:12

解决方案2
-1 2017-03-01 17:56:36