简体   繁体   English

水平堆积条形图并为每个部分添加标签

[英]Horizontal stacked bar plot and add labels to each section

I am trying to replicate the following image in matplotlib and it seems barh is my only option.我正在尝试在 matplotlib 中复制以下图像,看来barh是我唯一的选择。 Though it appears that you can't stack barh graphs so I don't know what to do虽然看起来你不能堆叠barh所以我不知道该怎么做

在此处输入图像描述

If you know of a better python library to draw this kind of thing, please let me know.如果你知道更好的 python 库来绘制这种东西,请告诉我。

This is all I could come up with as a start:这就是我能想到的开始:

import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt

people = ('A','B','C','D','E','F','G','H')
y_pos = np.arange(len(people))
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax.barh(y_pos, bottomdata,color='r',align='center')
ax.barh(y_pos, topdata,color='g',align='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')

plt.show()

I would then have to add labels individually using ax.text which would be tedious.然后我将不得不使用ax.text单独添加标签,这将是乏味的。 Ideally I would like to just specify the width of the part to be inserted then it updates the center of that section with a string of my choosing.理想情况下,我只想指定要插入的部分的宽度,然后用我选择的字符串更新该部分的中心。 The labels on the outside (eg 3800) I can add myself later, it is mainly the labeling over the bar section itself and creating this stacked method in a nice way I'm having problems with.外面的标签(例如 3800) 我可以稍后自己添加,主要是条形部分本身的标签,并以一种很好的方式创建这种堆叠方法,我遇到了问题。 Can you even specify a 'distance' ie span of color in any way?您甚至可以以任何方式指定“距离”,即颜色范围吗?

在此处输入图像描述

Edit 2: for more heterogeneous data.编辑 2:用于更多异构数据。 (I've left the above method since I find it more usual to work with the same number of records per series) (我已经离开了上述方法,因为我发现每个系列使用相同数量的记录更常见)

Answering the two parts of the question:回答问题的两个部分:

a) barh returns a container of handles to all the patches that it drew. a) barh为它绘制的所有补丁返回一个句柄容器。 You can use the coordinates of the patches to aid the text positions.您可以使用补丁的坐标来帮助文本位置。

b) Following these two answers to the question that I noted before (see Horizontal stacked bar chart in Matplotlib ), you can stack bar graphs horizontally by setting the 'left' input. b) 按照我之前提到的问题的两个答案(请参阅Matplotlib 中的水平堆叠条形图),您可以通过设置“左”输入来水平堆叠条形图。

and additionally c) handling data that is less uniform in shape.另外 c) 处理形状不太统一的数据。

Below is one way you could handle data that is less uniform in shape is simply to process each segment independently.以下是处理形状不太统一的数据的一种方法,即单独处理每个段。

import numpy as np
import matplotlib.pyplot as plt

# some labels for each row
people = ('A','B','C','D','E','F','G','H')
r = len(people)

# how many data points overall (average of 3 per person)
n = r * 3

# which person does each segment belong to?
rows = np.random.randint(0, r, (n,))
# how wide is the segment?
widths = np.random.randint(3,12, n,)
# what label to put on the segment (xrange in py2.7, range for py3)
labels = range(n)
colors ='rgbwmc'

patch_handles = []

fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)



left = np.zeros(r,)
row_counts = np.zeros(r,)

for (r, w, l) in zip(rows, widths, labels):
    print r, w, l
    patch_handles.append(ax.barh(r, w, align='center', left=left[r],
        color=colors[int(row_counts[r]) % len(colors)]))
    left[r] += w
    row_counts[r] += 1
    # we know there is only one patch but could enumerate if expanded
    patch = patch_handles[-1][0] 
    bl = patch.get_xy()
    x = 0.5*patch.get_width() + bl[0]
    y = 0.5*patch.get_height() + bl[1]
    ax.text(x, y, "%d%%" % (l), ha='center',va='center')
  
y_pos = np.arange(8)
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')

plt.show()

Which produces a graph like this这会产生这样的图表异质 hbar, with a different number of segments present in each series. ,每个系列中存在不同数量的段。

Note that this is not particularly efficient since each segment used an individual call to ax.barh .请注意,这并不是特别有效,因为每个段都使用了对ax.barh的单独调用。 There may be more efficient methods (eg by padding a matrix with zero-width segments or nan values) but this likely to be problem-specific and is a distinct question.可能有更有效的方法(例如,通过用零宽度段或 nan 值填充矩阵)但这可能是特定于问题的并且是一个独特的问题。


Edit: updated to answer both parts of the question.编辑:更新以回答问题的两个部分。

import numpy as np
import matplotlib.pyplot as plt

people = ('A','B','C','D','E','F','G','H')
segments = 4

# generate some multi-dimensional data & arbitrary labels
data = 3 + 10* np.random.rand(segments, len(people))
percentages = (np.random.randint(5,20, (len(people), segments)))
y_pos = np.arange(len(people))

fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)

colors ='rgbwmc'
patch_handles = []
left = np.zeros(len(people)) # left alignment of data starts at zero
for i, d in enumerate(data):
    patch_handles.append(ax.barh(y_pos, d, 
      color=colors[i%len(colors)], align='center', 
      left=left))
    # accumulate the left-hand offsets
    left += d
    
# go through all of the bar segments and annotate
for j in range(len(patch_handles)):
    for i, patch in enumerate(patch_handles[j].get_children()):
        bl = patch.get_xy()
        x = 0.5*patch.get_width() + bl[0]
        y = 0.5*patch.get_height() + bl[1]
        ax.text(x,y, "%d%%" % (percentages[i,j]), ha='center')

ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')

plt.show()

You can achieve a result along these lines (note: the percentages I used have nothing to do with the bar widths, as the relationship in the example seems unclear):您可以按照以下方式获得结果(注意:我使用的百分比与条形宽度无关,因为示例中的关系似乎不清楚):

示例输出

See Horizontal stacked bar chart in Matplotlib for some ideas on stacking horizontal bar plots.有关堆叠水平条形图的一些想法,请参阅Matplotlib 中的水平堆叠条形图


Imports and Test DataFrame导入和测试数据框

import pandas as pd
import numpy as np

# create sample data as shown in the OP
np.random.seed(365)
people = ('A','B','C','D','E','F','G','H')
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))

# create the dataframe
df = pd.DataFrame({'Female': bottomdata, 'Male': topdata}, index=people)

# display(df)
   Female   Male
A   12.41   7.42
B    9.42   4.10
C    9.85   7.38
D    8.89  10.53
E    8.44   5.92
F    6.68  11.86
G   10.67  12.97
H    6.05   7.87

Updated with matplotlib v3.4.2使用matplotlib v3.4.2更新

Plotted using pandas.DataFrame.plot with kind='barh'使用pandas.DataFrame.plotkind='barh' barh' 绘制

ax = df.plot(kind='barh', stacked=True, figsize=(8, 6))

for c in ax.containers:
    
    # customize the label to account for cases when there might not be a bar section
    labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
    
    # set the bar label
    ax.bar_label(c, labels=labels, label_type='center')

    # uncomment and use the next line if there are no nan or 0 length sections; just use fmt to add a % (the previous two lines of code are not needed, in this case)
#     ax.bar_label(c, fmt='%.2f%%', label_type='center')

# move the legend
ax.legend(bbox_to_anchor=(1.025, 1), loc='upper left', borderaxespad=0.)

# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()

在此处输入图像描述

Using seaborn使用 seaborn

Reshape dataframe重塑数据框

# convert the dataframe to a long form
df = df.reset_index()
df = df.rename(columns={'index': 'People'})
dfm = df.melt(id_vars='People', var_name='Gender', value_name='Percent')

# display(dfm)
   People  Gender    Percent
0       A  Female  12.414557
1       B  Female   9.416027
2       C  Female   9.846105
3       D  Female   8.885621
4       E  Female   8.438872
5       F  Female   6.680709
6       G  Female  10.666258
7       H  Female   6.050124
8       A    Male   7.420860
9       B    Male   4.104433
10      C    Male   7.383738
11      D    Male  10.526158
12      E    Male   5.916262
13      F    Male  11.857227
14      G    Male  12.966913
15      H    Male   7.865684

sns.histplot : axes-level plot sns.histplot :轴级图

fig, axe = plt.subplots(figsize=(8, 6))
sns.histplot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', ax=axe)

# iterate through each set of containers
for c in axe.containers:
    # add bar annotations
    axe.bar_label(c, fmt='%.2f%%', label_type='center')

axe.set_xlabel('Percent')
plt.show()

在此处输入图像描述

sns.displot : figure-level plot sns.displot :图形级图

g = sns.displot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', height=6)

# iterate through each facet / supbplot
for axe in g.axes.flat:
    # iteate through each set of containers
    for c in axe.containers:
        # add the bar annotations
        axe.bar_label(c, fmt='%.2f%%', label_type='center')
    axe.set_xlabel('Percent')

plt.show()

在此处输入图像描述

Original Answer - before matplotlib v3.4.2原始答案 - 在matplotlib v3.4.2之前

  • The easiest way to plot a horizontal or vertical stacked bar, is to load the data into a pandas.DataFrame绘制水平或垂直堆叠条的最简单方法是将数据加载到pandas.DataFrame
    • This will plot, and annotate correctly, even when all categories ( 'People' ), don't have all segments (eg some value is 0 or NaN )这将正确绘制和注释,即使所有类别( 'People' )都没有所有细分(例如,某些值为 0 或NaN
  • Once the data is in the dataframe:一旦数据在数据框中:
    1. It's easier to manipulate and analyze更容易操作和分析
    2. It can be plotted with the matplotlib engine, using:它可以用matplotlib引擎绘制,使用:
  • These methods return a matplotlib.axes.Axes or a numpy.ndarray of them.这些方法返回一个matplotlib.axes.Axes或它们的一个numpy.ndarray
  • Using the .patches method unpacks a list of matplotlib.patches.Rectangle objects, one for each of the sections of the stacked bar.使用.patches方法解包matplotlib.patches.Rectangle对象的列表,每个对象对应堆叠条的每个部分。
    • Each .Rectangle has methods for extracting the various values that define the rectangle.每个.Rectangle都有用于提取定义矩形的各种值的方法。
    • Each .Rectangle is in order from left the right, and bottom to top, so all the .Rectangle objects, for each level, appear in order, when iterating through .patches .每个.Rectangle的顺序是从左到右,从下到上,因此在遍历.patches时,每个级别的所有.Rectangle对象都会按顺序出现。
  • The labels are made using an f-string , label_text = f'{width:.2f}%' , so any additional text can be added as needed.标签是使用f-stringlabel_text = f'{width:.2f}%'制作的,因此可以根据需要添加任何其他文本。

Plot and Annotate绘图和注释

  • Plotting the bar, is 1 line, the remainder is annotating the rectangles绘制条形图,为 1 行,其余为矩形注释
# plot the dataframe with 1 line
ax = df.plot.barh(stacked=True, figsize=(8, 6))

# .patches is everything inside of the chart
for rect in ax.patches:
    # Find where everything is located
    height = rect.get_height()
    width = rect.get_width()
    x = rect.get_x()
    y = rect.get_y()
    
    # The height of the bar is the data value and can be used as the label
    label_text = f'{width:.2f}%'  # f'{width:.2f}' to format decimal values
    
    # ax.text(x, y, text)
    label_x = x + width / 2
    label_y = y + height / 2
    
    # only plot labels greater than given width
    if width > 0:
        ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)

# move the legend
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)

# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()

在此处输入图像描述

Example with Missing Segment缺少段的示例

# set one of the dataframe values to 0
df.iloc[4, 1] = 0
  • Note the annotations are all in the correct location from df .请注意,注释都位于df的正确位置。

在此处输入图像描述

For this case, the above answers work perfectly.对于这种情况,上述答案完美无缺。 The issue I had, and didn't find a plug-and-play solution online, was that I often have to plot stacked bars in multi-subplot figures, with many values, which tend to have very non-homogenous amplitudes.我遇到的问题,并且没有在网上找到即插即用的解决方案,是我经常必须在多子图中绘制堆积条形图,其中包含许多值,这些值往往具有非常不均匀的幅度。

(Note: I work usually with pandas dataframes, and matplotlib. I couldn't make the bar_label() method of matplotlib to work all the times.) (注意:我通常使用 pandas 数据帧和 matplotlib。我无法让 matplotlib 的 bar_label() 方法一直工作。)

So, I just give a kind of ad-hoc, but easily generalizable solution.所以,我只是给出一种临时的,但易于推广的解决方案。 In this example, I was working with single-row dataframes (for power-exchange monitoring purposes per hour), so, my dataframe (df) had just one row.在此示例中,我使用的是单行数据帧(用于每小时的电力交换监控目的),因此,我的数据帧 (df) 只有一行。

(I provide an example figure to show how this can be useful in very densely-packed plots) (我提供了一个示例图来说明这在非常密集的地块中如何有用)

[enter image description here][1] [1]: https://i.stack.imgur.com/9akd8.png [在此处输入图像描述][1] [1]:https://i.stack.imgur.com/9akd8.png

''' This implementation produces a stacked, horizontal bar plot. ''' 此实现生成堆叠的水平条形图。

df --> pandas dataframe. df --> 熊猫数据框。 Columns are used as the iterator, and only the firs value of each column is used.列用作迭代器,并且仅使用每列的第一个值。

waterfall--> bool: if True, apart from the stack-direction, also a perpendicular offset is added.瀑布-->布尔:如果为真,除了堆栈方向外,还添加了一个垂直偏移量。

cyclic_offset_x --> list (of any length) or None: loop through these values to use as x-offset pixels. cyclic_offset_x --> 列表(任意长度)或无:循环这些值以用作 x 偏移像素。

cyclic_offset_y --> list (of any length) or None: loop through these values to use as y-offset pixels. cyclic_offset_y --> 列表(任意长度)或无:循环这些值以用作 y 偏移像素。

ax --> matplotlib Axes, or None: if None, creates a new axis and figure. ax --> matplotlib 轴,或无:如果没有,则创建一个新的轴和图形。 ''' '''

    def magic_stacked_bar(df, waterfall=False, cyclic_offset_x=None, cyclic_offset_y=None, ax=None):



        if isinstance(cyclic_offset_x, type(None)):
            cyclic_offset_x = [0, 0]
        if isinstance(cyclic_offset_y, type(None)):
            cyclic_offset_y = [0, 0]

        ax0 = ax
        if isinstance(ax, type(None)):
            fig, ax = plt.subplots()
            fig.set_size_inches(19, 10)

        cycler = 0;
        prev = 0 # summation variable to make it stacked
        for c in df.columns:
            if waterfall:
                y = c ; label = "" # bidirectional stack
            else:
                y = 0; label = c # unidirectional stack
            ax.barh(y=y, width=df[c].values[0], height=1, left=prev, label = label)
            prev += df[c].values[0] # add to sum-stack

            offset_x = cyclic_offset_x[divmod(cycler, len(cyclic_offset_x))[1]]
            offset_y = cyclic_offset_y[divmod(cycler, len(cyclic_offset_y))[1]]

            ax.annotate(text="{}".format(int(df[c].values[0])), xy=(prev - df[c].values / 2, y),
                        xytext=(offset_x, offset_y), textcoords='offset pixels',
                        ha='center', va='top', fontsize=8,
                        arrowprops=dict(facecolor='black', shrink=0.01, width=0.3, headwidth=0.3),
                        bbox=dict(boxstyle='round', facecolor='grey', alpha=0.5))

            cycler += 1

        if not waterfall:
            ax.legend() # if waterfall, the index annotates the columns. If 
                        # waterfall ==False, the legend annotates the columns
        if isinstance(ax0, type(None)):
            ax.set_title("Voi la")
            ax.set_xlabel("UltraWatts")
            plt.show()
        else:
            return ax

''' (Sometimes, it is more tedious and requires some custom functions to make the labels look alright. ''' (有时,它更繁琐,需要一些自定义函数才能使标签看起来不错。

''' '''

A, B = 80,80
n_units = df.shape[1]
cyclic_offset_x = -A*np.cos(2*np.pi / (2*n_units)  *np.arange(n_units))
cyclic_offset_y = B*np.sin(2*np.pi / (2*n_units) * np.arange(n_units)) + B/2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM