[英]How to plot a stacked bar chart using pandas python
I have 3 dataframes for yearly data (one for 2014, 2015 and 2016), each having 3 columns named, 'PRACTICE', 'BNF NAME', 'ITEMS'
. 我有3个年度数据的数据框架(2014年,2015年和2016年有一个),每个数据框有3列,名为
'PRACTICE', 'BNF NAME', 'ITEMS'
。
BNF NAME refers to drug names and I am picking out 3 Ampicillin, Amoxicillin and Co-Amoxiclav. BNF NAME是指药物名称,我正在挑选3种氨苄青霉素,阿莫西林和Co-Amoxiclav。 This column has different strengths/dosages (eg Co-Amoxiclav 200mg or Co-Amoxiclav 300mg etc etc) that I want to ignore, so I have used
str.contains()
to select these 3 drugs. 这个专栏有不同的强度/剂量(例如Co-Amoxiclav 200mg或Co-Amoxiclav 300mg等)我想忽略,所以我用
str.contains()
来选择这3种药物。 ITEMS
is the total number of prescriptions written for each drug. ITEMS
是为每种药物开出的处方总数。
I want to create a stacked bar chart with the x axis being year (2014, 2014, 2015) and the y axis being total number of prescriptions, and each of the 3 bars to be split up into 3 for each drug name. 我想创建一个堆积条形图,其中x轴为年(2014,2014,2015),y轴为处方总数,每个药品名称将3个条中的每一个分成3个。
I am assuming I need to use df.groupby()
and select a partial string maybe, however I am unsure how to combine the yearly data and then how to group the data to create the stacked bar chart. 我假设我需要使用
df.groupby()
并选择一个部分字符串,但是我不确定如何组合年度数据,然后如何将数据分组以创建堆积条形图。
Any guidance would be much appreciated. 任何指导都将非常感谢。
This is the line of code I am using to select the rows for the 3 drug names only. 这是我用来选择3种药物名称的行的代码行。
frame=frame[frame['BNF NAME'].str.contains('Ampicillin' and 'Amoxicillin' and 'Co-Amoxiclav')]
This is what each of the dataframes resembles: 这就是每个数据帧类似的东西:
PRACTICE | BNF NAME | ITEMS
Y00327 | Co-Amoxiclav_Tab 250mg/125mg | 23
Y00327 | Co-Amoxiclav_Susp 125mg/31mg/5ml S/F | 10
Y00327 | Co-Amoxiclav_Susp 250mg/62mg/5ml S/F | 6
Y00327 | Co-Amoxiclav_Susp 250mg/62mg/5ml | 1
Y00327 | Co-Amoxiclav_Tab 500mg/125mg | 50
There are likely going to be a few different ways in which you could accomplish this. 可能会有几种不同的方式来实现这一目标。 Here's how I would do it.
这是我将如何做到这一点。 I'm using a jupyter notebook, so your matplotlib imports may be different.
我正在使用jupyter笔记本,所以你的matplotlib进口可能会有所不同。
import pandas as pd
%matplotlib
import matplotlib.pyplot as plt
import matplotlib
matplotlib.style.use('ggplot')
df = pd.DataFrame({'PRACTICE': ['Y00327', 'Y00327', 'Y00327', 'Y00327', 'Y00327'],
'BNF NAME': ['Co-Amoxiclav_Tab 250mg/125mg', 'Co-Amoxiclav_Susp 125mg/31mg/5ml S/F',
'Co-Amoxiclav_Susp 250mg/62mg/5ml S/F', 'Ampicillin 250mg/62mg/5ml',
'Amoxicillin_Tab 500mg/125mg'],
'ITEMS': [23, 10, 6, 1, 50]})
Out[52]:
BNF NAME ITEMS PRACTICE
0 Co-Amoxiclav_Tab 250mg/125mg 23 Y00327
1 Co-Amoxiclav_Susp 125mg/31mg/5ml S/F 10 Y00327
2 Co-Amoxiclav_Susp 250mg/62mg/5ml S/F 6 Y00327
3 Ampicillin 250mg/62mg/5ml 1 Y00327
4 Amoxicillin_Tab 500mg/125mg 50 Y00327
To simulate your three dataframes: 要模拟三个数据帧:
df1 = df.copy()
df2 = df.copy()
df3 = df.copy()
Set a column indicating what year the dataframe represents. 设置一个列,指示数据框表示的年份。
df1['YEAR'] = 2014
df2['YEAR'] = 2015
df3['YEAR'] = 2016
Combining the three dataframes: 结合三个数据帧:
combined_df = pd.concat([df1, df2, df3], ignore_index=True)
To set what drug each row represents: 设置每行代表的药物:
combined_df['parsed_drug_name'] = "" # creates a blank column
amp_bool = combined_df['BNF NAME'].str.contains('Ampicillin', case=False)
combined_df.loc[amp_bool, 'parsed_drug_name'] = 'Ampicillin' # sets the row to amplicillin, if BNF NAME contains 'ampicillin.'
amox_bool = combined_df['BNF NAME'].str.contains('Amoxicillin', case=False)
combined_df.loc[amox_bool, 'parsed_drug_name'] = 'Amoxicillin'
co_amox_bool = combined_df['BNF NAME'].str.contains('Co-Amoxiclav', case=False)
combined_df.loc[co_amox_bool, 'parsed_drug_name'] = 'Co-Amoxiclav'
Finally, perform a pivot on the data, and plot the results: 最后,对数据执行一个数据透视,并绘制结果:
combined_df.pivot_table(index='YEAR', columns='parsed_drug_name', values='ITEMS', aggfunc='sum').plot.bar(rot=0, stacked=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.