简体   繁体   English

如何为熊猫分组特征绘制堆积条形图?

[英]How to plot a stacked bar chart for pandas grouped features?

I have the following dataframe:我有以下数据框:

Part_Number    Serial_Number    Timestamp    Feature    Machine    Tool    Rework
PN1    100    9/9/2020 8:26    FEAT_FN_H30    H10    93    1
PN1    101    9/9/2020 9:05    FEAT1_FN_H12    G3    85    2
PN1    102    9/9/2020 9:29    FEAT_FN_H23    H4    81    1
PN1    103    9/9/2020 10:53    FEAT1_FN_H15    H7    24    3
PN1    104    9/9/2020 10:53    FEAT1_FN_H17    H7    24    3
PN1    105    9/9/2020 10:53    FEAT1_FN_H19        24    1
PN1    106    9/9/2020 12:03    FEAT1_LF_LOC_BE    F7    6    2
PN1    107    9/9/2020 12:03    FEAT1_LF_LOC_FN            12
PN1    108    9/9/2020 12:33    FEAT_FN_H42        93    2
PN1    109    9/9/2020 12:33    FEAT_FN_H43    H10    83    1
PN1    110    9/9/2020 12:33    FEAT_FN_H44    H10    83    1
PN1    111    9/9/2020 12:33    FEAT_FN_H45    H10    83    1
PN1    112    9/9/2020 12:33    FEAT_FN_H46    H10    83    1

What I need is a bar chart with the sum of reworks performed for each feature.我需要的是一个条形图,其中包含为每个功能执行的返工总和。

I then need that bar broken down by what machine performed those reworks on that feature.然后,我需要根据对该功能执行这些返工的机器来细分该条。 I attempted to upload a photo of what I am looking for but kept receiving an error.我试图上传我正在寻找的照片,但一直收到错误消息。

  • pandas.DataFrame.groupby the desired features and aggregate the sum pandas.DataFrame.groupby所需的特征并聚合sum
  • .unstack the grouped DataFrame from a long, to a wide format. .unstack分组的DataFrame从长格式到宽格式。
  • Remove level 0 of the multi-level column names created when unstacking移除在拆垛时创建的多级列名的第 0 级
  • Plot the stacked bar, with pandas.DataFrame.plot使用pandas.DataFrame.plot绘制堆叠条形图
    • pandas.DataFrame.plot.bar
    • If you have a lot of 'Feature' , use .barh for a horizontal bar, adjust figsize as needed, and change plt.xlabel('Rework') to ylabel .如果您有很多'Feature' ,请使用.barh作为水平条,根据需要调整figsize ,并将plt.xlabel('Rework')更改为ylabel
import pandas as pd
import matplotlib.pyplot as plt

# read data
df = pd.read_csv('test.csv')

# display(df)
   Part_Number  Serial_Number       Timestamp          Feature Machine  Tool  Rework
0          PN1            100   9/9/2020 8:26      FEAT_FN_H30     H10  93.0       1
1          PN1            101   9/9/2020 9:05     FEAT1_FN_H12      G3  85.0       2
2          PN1            101   9/9/2020 9:05     FEAT1_FN_H12      G3  85.0       3
3          PN1            101   9/9/2020 9:05     FEAT1_FN_H12      G4  85.0       2
4          PN1            102   9/9/2020 9:29      FEAT_FN_H23      H4  81.0       1
5          PN1            103  9/9/2020 10:53     FEAT1_FN_H15      H7  24.0       3
6          PN1            104  9/9/2020 10:53     FEAT1_FN_H17      H7  24.0       3
7          PN1            105  9/9/2020 10:53     FEAT1_FN_H19     NaN  24.0       1
8          PN1            106  9/9/2020 12:03  FEAT1_LF_LOC_BE      F7   6.0       2
9          PN1            107  9/9/2020 12:03  FEAT1_LF_LOC_FN     NaN   NaN      12
10         PN1            108  9/9/2020 12:33      FEAT_FN_H42     NaN  93.0       2
11         PN1            109  9/9/2020 12:33      FEAT_FN_H43     H10  83.0       1
12         PN1            110  9/9/2020 12:33      FEAT_FN_H44     H10  83.0       1
13         PN1            111  9/9/2020 12:33      FEAT_FN_H45     H10  83.0       1
14         PN1            112  9/9/2020 12:33      FEAT_FN_H46     H10  83.0       1

# groupby and sum
dfg = df.groupby(['Feature', 'Machine']).agg({'Rework': sum}).unstack()

# fix column names
dfg.columns = dfg.columns.droplevel(0)

# sort the index by the sum of the rows
dfg = dfg.reindex(dfg.sum(axis=1).sort_values(ascending=False).index)

# plot the top 5
dfg.head().plot.bar(stacked=True, figsize=(8, 5), grid=True)
plt.ylabel('Rework')
plt.legend(title=dfg.columns.name, bbox_to_anchor=(1.05, 1), loc='upper left')

在此处输入图片说明

Test data测试数据

  • Extra data was added for testing, because there where no examples with a 'Feature' being produced by more than one 'Machine' .添加了额外的数据用于测试,因为在那里没有由多个'Machine'生成的具有'Feature'示例。
Part_Number,Serial_Number,Timestamp,Feature,Machine,Tool,Rework
PN1,100,9/9/2020 8:26,FEAT_FN_H30,H10,93,1
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G3,85,2
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G3,85,3
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G4,85,2
PN1,102,9/9/2020 9:29,FEAT_FN_H23,H4,81,1
PN1,103,9/9/2020 10:53,FEAT1_FN_H15,H7,24,3
PN1,104,9/9/2020 10:53,FEAT1_FN_H17,H7,24,3
PN1,105,9/9/2020 10:53,FEAT1_FN_H19,,24,1
PN1,106,9/9/2020 12:03,FEAT1_LF_LOC_BE,F7,6,2
PN1,107,9/9/2020 12:03,FEAT1_LF_LOC_FN,,,12
PN1,108,9/9/2020 12:33,FEAT_FN_H42,,93,2
PN1,109,9/9/2020 12:33,FEAT_FN_H43,H10,83,1
PN1,110,9/9/2020 12:33,FEAT_FN_H44,H10,83,1
PN1,111,9/9/2020 12:33,FEAT_FN_H45,H10,83,1
PN1,112,9/9/2020 12:33,FEAT_FN_H46,H10,83,1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM