如何使用相关列值自定义 pandas 条形图文本注释？

Question

I am trying to build a stacked barplot with customized text on the annotations.我正在尝试使用注释上的自定义文本构建堆叠条形图。 The barplot is built using "morning_sales" and "afternoon_sales" entries for a list of store locations, and I would like to build a custom label for each box to show the height of the box and the a related value from another column (in this case, matching "morning_staff" with "morning_sales" and "afternoon_staff" with "afternoon_sales").条形图是使用商店位置列表的“morning_sales”和“afternoon_sales”条目构建的，我想为每个框构建一个自定义 label 以显示框的高度和来自另一列的相关值（在此情况下，将“morning_staff”与“morning_sales”匹配，将“afternoon_staff”与“afternoon_sales”匹配）。

My method works, but relies on knowing the order of the barplot rectangles... I'm concerned that things may fall apart if I do any re-ordering of the bars or related manipulations.我的方法有效，但依赖于知道条形图矩形的顺序......我担心如果我对条形图或相关操作进行任何重新排序，事情可能会分崩离析。 Can anyone recommend a better way to do this?谁能推荐一个更好的方法来做到这一点？ Note that this is a "dummy" dataframe;请注意，这是一个“虚拟” dataframe； my true dataset is several hundred thousand rows.我的真实数据集是几十万行。

I am not sure if there is a way to extract text using the "handles, labels = ax.get_legend_handles_labels()" method?我不确定是否有办法使用“handles, labels = ax.get_legend_handles_labels()”方法提取文本？

Here is the code:这是代码：

import pandas as pd

data = {'location': ['Toronto', 'Vancouver', 'Edmonton', 'Calgary'],
        'morning_staff': [3, 12, 25, 6],
        'afternoon_staff': [2, 8, None, 8],
        'morning_sales': [8000, 25000, 40000, 15000],
        'afternoon_sales': [4000, 15000, None, 6000]
}
df = pd.DataFrame(data, columns = ['location', 'morning_staff', 'afternoon_staff', 'morning_sales', 'afternoon_sales' ])

# > Drop 'Calgary' from plot dataset and extract columns for plotting
df_plot = df.loc[df['location'] != 'Calgary', ['location', 'morning_sales', 'afternoon_sales']]
ax = df_plot.plot.bar(x='location', stacked=True, figsize=(8,6), colormap='tab10', fontsize=14)

# Add an annotation to each bar -> Showing staff required for sales
col_tags = ['morning_staff', 'afternoon_staff']
locations = df_plot['location'].tolist()
bar_labels = []
for col_tag in col_tags:   # morning_sales, afternoon_sales
    for location in locations:
        idx = df.loc[df['location'] == location].index[0]
        bar_label = df.loc[idx, col_tag].item()
        bar_labels.append(bar_label)

rects = ax.patches        
for rect, bar_label in zip(rects, bar_labels):
    width, height = rect.get_width(), rect.get_height()
    if ((height != 0) & (bar_label != np.nan)) :
        x, y = rect.get_xy()
        text = f'{int(bar_label)}: {int(height)}'
        ax.text(x+width/2, 
                y+height/2, 
                text, 
                horizontalalignment='center', 
                verticalalignment='center',
                fontsize=12)

Answer 1

import pandas as pd

data = {'location': ['Toronto', 'Vancouver', 'Edmonton', 'Calgary'],
        'morning_staff': [3, 12, 25, 6],
        'afternoon_staff': [2, 8, None, 8],
        'morning_sales': [8000, 25000, 40000, 15000],
        'afternoon_sales': [4000, 15000, None, 6000]
}
df=pd.DataFrame.from_dict(data)
df.set_index('location',inplace=True)
df['afternoon_staff']=df['afternoon_staff'].astype('Int64')
print(df)

df_plot=df.iloc[:-1,:]#skip the last row Calgary using indexing
df_plot.iloc[:,2:].plot(kind='bar',stacked=True,,colormap='tab10')
for i in range(len(df_plot)):
    morning_lable=str(df_plot['morning_staff'][i])+':'+str(df_plot['morning_sales'][i])
    afternoon_lable=str(df_plot['afternoon_staff'][i])+':'+str(df_plot['afternoon_sales'][i])
    plt.annotate(morning_lable,(i-0.2,df_plot['morning_sales'][i]/2))
    plt.annotate(afternoon_lable,(i-0.2,df_plot['morning_sales'][i]+df_plot['afternoon_sales'][i]/2))
plt.tight_layout()

Output: Output：

              morning_staff  afternoon_staff  morning_sales  afternoon_sales
location                                                                 
Toronto                3                2           8000           4000.0
Vancouver             12                8          25000          15000.0
Edmonton              25             <NA>          40000              NaN
Calgary                6                8          15000           6000.0

如何使用相关列值自定义 pandas 条形图文本注释？

问题描述

1 个解决方案

解决方案1
0 2020-07-01 08:20:15

如何使用相关列值自定义 pandas 条形图文本注释？

问题描述

1 个解决方案

解决方案1 0 2020-07-01 08:20:15

解决方案1
0 2020-07-01 08:20:15