[英]How to plot a stacked bar chart for pandas grouped features?
我有以下数据框:
Part_Number Serial_Number Timestamp Feature Machine Tool Rework
PN1 100 9/9/2020 8:26 FEAT_FN_H30 H10 93 1
PN1 101 9/9/2020 9:05 FEAT1_FN_H12 G3 85 2
PN1 102 9/9/2020 9:29 FEAT_FN_H23 H4 81 1
PN1 103 9/9/2020 10:53 FEAT1_FN_H15 H7 24 3
PN1 104 9/9/2020 10:53 FEAT1_FN_H17 H7 24 3
PN1 105 9/9/2020 10:53 FEAT1_FN_H19 24 1
PN1 106 9/9/2020 12:03 FEAT1_LF_LOC_BE F7 6 2
PN1 107 9/9/2020 12:03 FEAT1_LF_LOC_FN 12
PN1 108 9/9/2020 12:33 FEAT_FN_H42 93 2
PN1 109 9/9/2020 12:33 FEAT_FN_H43 H10 83 1
PN1 110 9/9/2020 12:33 FEAT_FN_H44 H10 83 1
PN1 111 9/9/2020 12:33 FEAT_FN_H45 H10 83 1
PN1 112 9/9/2020 12:33 FEAT_FN_H46 H10 83 1
我需要的是一个条形图,其中包含为每个功能执行的返工总和。
然后,我需要根据对该功能执行这些返工的机器来细分该条。 我试图上传我正在寻找的照片,但一直收到错误消息。
pandas.DataFrame.groupby
所需的特征并聚合sum
.unstack
分组的DataFrame
从长格式到宽格式。pandas.DataFrame.plot
绘制堆叠条形图
pandas.DataFrame.plot.bar
'Feature'
,请使用.barh
作为水平条,根据需要调整figsize
,并将plt.xlabel('Rework')
更改为ylabel
。import pandas as pd
import matplotlib.pyplot as plt
# read data
df = pd.read_csv('test.csv')
# display(df)
Part_Number Serial_Number Timestamp Feature Machine Tool Rework
0 PN1 100 9/9/2020 8:26 FEAT_FN_H30 H10 93.0 1
1 PN1 101 9/9/2020 9:05 FEAT1_FN_H12 G3 85.0 2
2 PN1 101 9/9/2020 9:05 FEAT1_FN_H12 G3 85.0 3
3 PN1 101 9/9/2020 9:05 FEAT1_FN_H12 G4 85.0 2
4 PN1 102 9/9/2020 9:29 FEAT_FN_H23 H4 81.0 1
5 PN1 103 9/9/2020 10:53 FEAT1_FN_H15 H7 24.0 3
6 PN1 104 9/9/2020 10:53 FEAT1_FN_H17 H7 24.0 3
7 PN1 105 9/9/2020 10:53 FEAT1_FN_H19 NaN 24.0 1
8 PN1 106 9/9/2020 12:03 FEAT1_LF_LOC_BE F7 6.0 2
9 PN1 107 9/9/2020 12:03 FEAT1_LF_LOC_FN NaN NaN 12
10 PN1 108 9/9/2020 12:33 FEAT_FN_H42 NaN 93.0 2
11 PN1 109 9/9/2020 12:33 FEAT_FN_H43 H10 83.0 1
12 PN1 110 9/9/2020 12:33 FEAT_FN_H44 H10 83.0 1
13 PN1 111 9/9/2020 12:33 FEAT_FN_H45 H10 83.0 1
14 PN1 112 9/9/2020 12:33 FEAT_FN_H46 H10 83.0 1
# groupby and sum
dfg = df.groupby(['Feature', 'Machine']).agg({'Rework': sum}).unstack()
# fix column names
dfg.columns = dfg.columns.droplevel(0)
# sort the index by the sum of the rows
dfg = dfg.reindex(dfg.sum(axis=1).sort_values(ascending=False).index)
# plot the top 5
dfg.head().plot.bar(stacked=True, figsize=(8, 5), grid=True)
plt.ylabel('Rework')
plt.legend(title=dfg.columns.name, bbox_to_anchor=(1.05, 1), loc='upper left')
'Machine'
生成的具有'Feature'
示例。Part_Number,Serial_Number,Timestamp,Feature,Machine,Tool,Rework
PN1,100,9/9/2020 8:26,FEAT_FN_H30,H10,93,1
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G3,85,2
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G3,85,3
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G4,85,2
PN1,102,9/9/2020 9:29,FEAT_FN_H23,H4,81,1
PN1,103,9/9/2020 10:53,FEAT1_FN_H15,H7,24,3
PN1,104,9/9/2020 10:53,FEAT1_FN_H17,H7,24,3
PN1,105,9/9/2020 10:53,FEAT1_FN_H19,,24,1
PN1,106,9/9/2020 12:03,FEAT1_LF_LOC_BE,F7,6,2
PN1,107,9/9/2020 12:03,FEAT1_LF_LOC_FN,,,12
PN1,108,9/9/2020 12:33,FEAT_FN_H42,,93,2
PN1,109,9/9/2020 12:33,FEAT_FN_H43,H10,83,1
PN1,110,9/9/2020 12:33,FEAT_FN_H44,H10,83,1
PN1,111,9/9/2020 12:33,FEAT_FN_H45,H10,83,1
PN1,112,9/9/2020 12:33,FEAT_FN_H46,H10,83,1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.