繁体   English   中英

如何将错误栏添加到分组栏 plot?

[英]How to add error bars to a grouped bar plot?

我想在我的 plot 中添加错误栏,我可以显示每个 plot 的最小最大值。拜托,任何人都可以帮助我。 提前致谢。

最小最大值如下:

延迟 = (53.46 (min 0, max60), 36.22 (min 12,max 70), 83 (min 21,max 54), 17 (min 12,max 70)) 延迟 = (38 (min 2,max 70), 44(最小 12,最大 87),53(最小 9,最大 60),10(最小 11,最大 77))

import matplotlib.pyplot as plt
import pandas as pd
from pandas import DataFrame
from matplotlib.dates import date2num
import datetime

Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency}, index=index)
ax = df.plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)
plt.savefig('TestX.png', dpi=300, bbox_inches='tight')
plt.show()

在此处输入图像描述

  • 为了在条形图上的正确位置绘图,必须提取每个条形的补丁数据。
  • 返回一个ndarray ,每列一个matplotlib.axes.Axes
    • 在此图的情况下, ax.patches包含 8 个matplotlib.patches.Rectangle对象,每个条形的每个段一个。
      • 通过使用此对象的关联方法,可以提取heightwidthx位置,并使用plt.vlines绘制一条线。
  • 条形的height用于从dict , z提取正确的minmax
    • 不幸的是,补丁数据不包含条形标签(例如Delay & Latency )。
import pandas as pd
import matplotlib.pyplot as plt

# create dataframe
Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency}, index=index)

# dicts with errors
Delay_error = {53.46: {'min': 0,'max': 60}, 36.22: {'min': 12,'max': 70}, 83: {'min': 21,'max': 54}, 17: {'min': 12,'max': 70}}
Latency_error = {38: {'min': 2, 'max': 70}, 44: {'min': 12,'max': 87}, 53: {'min': 9,'max': 60}, 10: {'min': 11,'max': 77}}

# combine them; providing all the keys are unique
z = {**Delay_error, **Latency_error}

# plot
ax = df.plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)

for p in ax.patches:
    x = p.get_x()  # get the bottom left x corner of the bar
    w = p.get_width()  # get width of bar
    h = p.get_height()  # get height of bar
    min_y = z[h]['min']  # use h to get min from dict z
    max_y = z[h]['max']  # use h to get max from dict z
    plt.vlines(x+w/2, min_y, max_y, color='k')  # draw a vertical line

在此处输入图片说明

  • 如果两个dicts存在非唯一值,因此无法组合,我们可以根据条形图顺序选择正确的dict
  • 首先绘制单个标签的所有条形图。
    • 在这种情况下,索引 0-3 是Dalay bar,4-7 是Latency bar
for i, p in enumerate(ax.patches):
    print(i, p)
    x = p.get_x()
    w = p.get_width()
    h = p.get_height()
    
    if i < len(ax.patches)/2:  # select which dictionary to use
        d = Delay_error
    else:
        d = Latency_error
        
    min_y = d[h]['min']
    max_y = d[h]['max']
    plt.vlines(x+w/2, min_y, max_y, color='k')

一些压缩和堆叠就足够了——参见下面的bar_min_maxs 简化并略微概括特伦顿的代码:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# create dataframe
Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency,
                   'Delay_min':   (0,  12, 21, 12),  # supply min and max
                   'Delay_max':   (60, 70, 54, 70),
                   'Latency_min': (2,  12, 9,  11),
                   'Latency_max': (70, 87, 60, 77)},
                  index=index)

# plot
ax = df[['Delay', 'Latency']].plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)

# bar_min_maxs[i] is bar/patch i's min, max
bar_min_maxs = np.vstack((list(zip(df['Delay_min'], df['Delay_max'])),
                          list(zip(df['Latency_min'], df['Latency_max']))))
assert len(bar_min_maxs) == len(ax.patches)

for patch, (min_y, max_y) in zip(ax.patches, bar_min_maxs):
    plt.vlines(patch.get_x() + patch.get_width()/2,
               min_y, max_y, color='k')

min_max_barplot

如果错误栏是通过错误幅度而不是最小值和最大值来表示的,即错误栏以栏的高度 w/长度 2 x 错误幅度为中心,那么这里是 plot 的代码:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# create dataframe
Delay = (53.46, 36.22, 83, 17)
Latency = (38, 44, 53, 10)
index = ['T=0', 'T=26', 'T=50','T=900']
df = pd.DataFrame({'Delay': Delay, 'Latency': Latency,
                   'Delay_moe':   (5,  15, 25, 35),  # supply margin of error
                   'Latency_moe': (10, 20, 30, 40)},
                  index=index)

# plot
ax = df[['Delay', 'Latency']].plot.bar(rot=0)
plt.xlabel('Time')
plt.ylabel('(%)')
plt.ylim(0, 101)

# bar_moes[i] is bar/patch i's margin of error, i.e., half the length of an
# errorbar centered at the bar's height
bar_moes = np.ravel(df[['Delay_moe', 'Latency_moe']].values.T)
assert len(bar_moes) == len(ax.patches)

for patch, moe in zip(ax.patches, bar_moes):
    height = patch.get_height() # of bar
    min_y, max_y = height - moe, height + moe
    plt.vlines(patch.get_x() + patch.get_width()/2,
               min_y, max_y, color='k')

moe_barplot

一个小的统计说明:如果对两组之间的差异 b/t(每个 T=t 的延迟和延迟)感兴趣,则为差异添加 plot,并为差异添加误差条。 像上面这样的 plot 不足以直接分析差异; 例如,如果两个误差条在 T=0 处重叠,这并不意味着 b/t 延迟和延迟的差异在使用的任何级别上都不具有统计显着性。 (尽管如果它们不重叠,那么差异在统计上是显着的。)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM