简体   繁体   中英

How to create a 100% stacked barplot from a categorical dataframe

I have a dataframe structured like this:

User Food 1 Food 2 Food 3 Food 4
Steph Onions Tomatoes Cabbages Potatoes
Tom Potatoes Tomatoes Potatoes Potatoes
Fred Carrots Cabbages Eggplant
Phil Onions Eggplant Eggplant

I want to use the distinct values from across the food columns as categories. I then want to create a Seaborn plot so the % of each category for each column is plotted as a 100% horizontal stacked bar.

My attempt to do this:

data = {
    'User' : ['Steph', 'Tom', 'Fred', 'Phil'],
    'Food 1' : ["Onions", "Potatoes", "Carrots", "Onions"],
    'Food 2' : ['Tomatoes', 'Tomatoes', 'Cabbages', 'Eggplant'],
    'Food 3' : ["Cabbages", "Potatoes", "", "Eggplant"],
    'Food 4' : ['Potatoes', 'Potatoes', 'Eggplant', ''],    
}

df = pd.DataFrame(data)

x_ax = ["Onions", "Potatoes", "Carrots", "Onions", "", 'Eggplant', "Cabbages"]

df.plot(kind="barh", x=x_ax, y=["Food 1", "Food 2", "Food 3", "Food 4"], stacked=True, ax=axes[1])

plt.show()
  1. Replace '' with np.nan because empty stings will be counted as values.
  2. Use pandas.DataFrame.melt to convert the dataframe to a long form.
  3. Use pandas.crosstab to get a frequency count table
  4. Get the total for each row.
  5. Calculate the category percent for each row.
  6. Plot the dataframe with pandas.DataFrame.plot and kind='barh' .
    • Putting the food names on the x-axis is not the correct way to create a 100% stacked bar plot. One axis must be numeric. The bars will be colored by food type.
  7. Annotate the bars based on this answer .
  8. Move the legend outside the plot based on this answer .
  • seaborn is a high-level API for matplotlib , and pandas uses matplotlib as the default backend, and it's easier to produce a stacked bar plot with pandas .
    • seaborn doesn't support stacked barplots, unless histplot is used in a hacked way, as shown in this answer , and would require an extra step of melting percent .
  • Tested in python 3.10 , pandas 1.4.2 , matplotlib 3.5.1
    • Assignment expressions ( := ) require python >= 3.8 . Otherwise, use [f'{v.get_width():.2f}%' if v.get_width() > 0 else '' for v in c ] .
import pandas as pd
import numpy as np

# using the dataframe in the OP

# 1.
df = df.replace('', np.nan)

# 2.
dfm = df.melt(id_vars='User', var_name='Food', value_name='Type')

# 3.
ct = pd.crosstab(dfm.Food, dfm.Type)

# 4.
total = ct.sum(axis=1)

# 5. 
percent = ct.div(total, axis=0).mul(100).round(2)

# 6.
ax = percent.plot(kind='barh', stacked=True, figsize=(8, 6))

# 7.
for c in ax.containers:
    
    # customize the label to account for cases when there might not be a bar section
    labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
    
    # set the bar label
    ax.bar_label(c, labels=labels, label_type='center')

# 8.
ax.legend(bbox_to_anchor=(1, 1.02), loc='upper left')

在此处输入图像描述

DataFrame Views

dfm

     User    Food      Type
0   Steph  Food 1    Onions
1     Tom  Food 1  Potatoes
2    Fred  Food 1   Carrots
3    Phil  Food 1    Onions
4   Steph  Food 2  Tomatoes
5     Tom  Food 2  Tomatoes
6    Fred  Food 2  Cabbages
7    Phil  Food 2  Eggplant
8   Steph  Food 3  Cabbages
9     Tom  Food 3  Potatoes
10   Fred  Food 3       NaN
11   Phil  Food 3  Eggplant
12  Steph  Food 4  Potatoes
13    Tom  Food 4  Potatoes
14   Fred  Food 4  Eggplant
15   Phil  Food 4       NaN

ct

Type    Cabbages  Carrots  Eggplant  Onions  Potatoes  Tomatoes
Food                                                           
Food 1         0        1         0       2         1         0
Food 2         1        0         1       0         0         2
Food 3         1        0         1       0         1         0
Food 4         0        0         1       0         2         0

total

Food
Food 1    4
Food 2    4
Food 3    3
Food 4    3
dtype: int64

percent

Type    Cabbages  Carrots  Eggplant  Onions  Potatoes  Tomatoes
Food                                                           
Food 1      0.00     25.0      0.00    50.0     25.00       0.0
Food 2     25.00      0.0     25.00     0.0      0.00      50.0
Food 3     33.33      0.0     33.33     0.0     33.33       0.0
Food 4      0.00      0.0     33.33     0.0     66.67       0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM