简体   繁体   中英

How to make a stacked bar plot for percentage of classes per year

I need to make a stacked bar plot using this dataset(head):

data = {'model': ['A1', 'A6', 'A1', 'A4', 'A3'],
        'year': [2017, 2016, 2016, 2017, 2019],
        'price': [12500, 16500, 11000, 16800, 17300],
        'transmission': ['Manual', 'Automatic', 'Manual', 'Automatic', 'Manual'],
        'mileage': [15735, 36203, 29946, 25952, 1998],
        'fuelType': ['Petrol', 'Diesel', 'Petrol', 'Diesel', 'Petrol'],
        'tax': [150, 20, 30, 145, 145],
        'mpg': [55.4, 64.2, 55.4, 67.3, 49.6],
        'engineSize': [1.4, 2.0, 1.4, 2.0, 1.0]}

df = pd.DataFrame(data)

  model  year  price transmission  mileage fuelType  tax   mpg  engineSize
0    A1  2017  12500       Manual    15735   Petrol  150  55.4         1.4
1    A6  2016  16500    Automatic    36203   Diesel   20  64.2         2.0
2    A1  2016  11000       Manual    29946   Petrol   30  55.4         1.4
3    A4  2017  16800    Automatic    25952   Diesel  145  67.3         2.0
4    A3  2019  17300       Manual     1998   Petrol  145  49.6         1.0

I would like the years (1997-2021) on x-axis and numbers ranging from 0 to 100 on the y-axis representing percentages. Finally, I would like the three different fuelTypes to be shown in yearly proportions; Petrol, Diesel and Hybrid.

I've already done the following calculations to get my percentages, per fuelType, per year and now I need to put it on a graph:

fuel_percentage = round((my_data_frame.groupby(['year'])['fuelType'].value_counts()/my_data_frame.groupby('year')['fuelType'].count())*100, 2)

print(fuel_percentage)

Which gives me the following result:

year  fuelType
1997  Petrol      100.00
1998  Petrol      100.00
2002  Petrol      100.00
2003  Diesel       66.67
      Petrol       33.33
2004  Petrol       80.00
      Diesel       20.00
2005  Petrol       71.43
      Diesel       28.57
2006  Petrol       66.67
      Diesel       33.33
2007  Petrol       56.25
      Diesel       43.75
2008  Diesel       66.67
      Petrol       33.33
etc...

My main worry is that since the object is not a dataframe I won't be able to use it to make a plot.

Here is an example of the kind of plot I would like ( replace players with fuelTypes and y-axis with percentages ): 在此处输入图像描述

Thanks for the help!

... edit... 在此处输入图像描述

  • Tested in python 3.8.11 , pandas 1.3.3 , matplotlib 3.4.3

.groupby & .unstack

  • pandas.DataFrame.groupby creates a long dataframe that must be unstacked to a wide form, to easily work with the plotting API
import pandas as pd

# I'm not a fan of this option because it requires doing .groupby twice
# calculate percent with groupby
dfc = (df.groupby(['year'])['fuelType'].value_counts() / df.groupby('year')['fuelType'].count()).mul(100).round(1)

# unstack the long dataframe
dfc = dfc.unstack(level=1)
  • .groupby with .value_counts and .unstack
dfc = df.groupby(['year'])['fuelType'].value_counts(normalize=True).mul(100).round(1).unstack(level=1)

.crosstab

# get the normalized value counts by index
dfc = pd.crosstab(df.year, df.fuelType, normalize='index').mul(100).round(1)

Plot

# display(dfc)
fuelType  Diesel  Petrol
year                    
2016        50.0    50.0
2017        50.0    50.0
2019         0.0   100.0

# plot bar
ax = dfc.plot(kind='bar', ylabel='Percent(%)', stacked=True, rot=0, figsize=(10, 4))

在此处输入图像描述

  • Remove xticks=dfc.index to have the plotting API have more values on the x-axis.
# plot area
ax = dfc.plot(kind='area', ylabel='Percent(%)', rot=0, figsize=(10, 4), xticks=dfc.index)

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM