Pandas groupby sort within groups retaining multiple aggregates and visualize it with facet

Question

I have this example dataset

products = ["A", "B", "C", "D"]
stores = ["store1", "store2", "store3"]
n = 30

product_list = [products[i] for i in np.random.randint(0, len(products), n)]
store_list = [stores[i] for i in np.random.randint(0, len(stores), n)]
rating_list = np.random.random(n) * 5
sales_list = np.random.random(n) * 10000

df = pd.DataFrame(
    {'store': store_list, 
     'product': product_list, 
     'sales': sales_list, 
     'rating': rating_list})

and then sum the sales

df_1=df.groupby(['store','product']).agg({'sales':['sum']})
df_1

在此处输入图像描述

and ordered it by highest sales while maintain the store

df_2 = df_1.groupby(level=0, group_keys=False).apply(
                   lambda x: x.sort_values(('sales', 'sum'), ascending=False))
df_2

在此处输入图像描述

How can I facet by the store, so the resulting visualization is like the following?

在此处输入图像描述

Answer 1

You should reset the index in the last passage:

df_2 = df_1.groupby(level=0, group_keys=False).apply(
                   lambda x: x.sort_values(('sales', 'sum'), ascending=False)).reset_index()

Then you can plot with seaborn.FacetGrid :

g = sns.FacetGrid(df_2, col = 'store')
g.map(sns.barplot, 'product', 'sales')

plt.show()

Answer 2

Visualizations are about telling a story, and presenting data in a clear and concise manner to convey that story. As such, it is more clear to have the same xaxis order for each plot.
- Someone looking at your visualization should be able to quickly discern which product from which store has the highest total sales, but that is not easy if the product categories for every axis are not in the same order
This can be done with pandas.DataFrame.plot by shaping the data with pandas.DataFrame.pivot_table .
Tested with python 3.8.11 , matplotlib 3.4.2 , seaborn 0.11.2 , and pandas 1.3.1 .

import pandas as pd
import matplotlib.pyplot as plt

# using the sample data; reshape df
dfp = df.pivot_table(index='product', columns='store', values='sales', aggfunc='sum')

# display(dfp)
store          store1        store2        store3
product                                          
A         9303.543781  15323.422183  20738.561588
B                 NaN   7549.028221           NaN
C        13976.321362  22350.050356   9865.392344
D         6905.455849   3183.767513   6010.941242

# plot
dfp.plot(kind='bar', subplots=True, layout=(1, 3), figsize=(8, 4), legend=False, rot=0,
         sharey=True, title='Store Sales by Product', ylabel='Total Sales')
plt.show()

This presentation is more clear without subplots (remove subplots=True )
- It's more clear that product B only had sales at store 2

dfp.plot(kind='bar', rot=0, figsize=(5, 3), title='Store Sales by Product', ylabel='Total Sales')
plt.show()

Switching the categories for index and columns tells a different story

dfp = df.pivot_table(index='store', columns='product', values='sales', aggfunc='sum')

dfp.plot(kind='bar', rot=0, figsize=(5, 3), title='Product Sales by Store', ylabel='Total Sales')
plt.show()

`seaborn.catplot`

With .catplot this can be done without .groupby or .pivot_table because kind='bar' has an estimator parameter.
Using col=

import seaborn as sns

sns.catplot(kind='bar', data=df, col='store', x='product', y='sales',
            order=sorted(products), col_order=sorted(stores), estimator=sum, ci=False, height=3)
plt.show()

Using hue=
FYI, the random data ( df ) for this plot is different than the other plots.

sns.catplot(kind='bar', data=df, hue='store', x='product', y='sales', height=3,
            col_order=sorted(stores), estimator=sum, ci=False, order=sorted(products))
plt.show()

Pandas groupby sort within groups retaining multiple aggregates and visualize it with facet

Question

2 answers

solution1
3 ACCPTED 2021-08-19 07:23:10

solution2
2 2021-08-20 23:07:24

`seaborn.catplot`

Pandas groupby sort within groups retaining multiple aggregates and visualize it with facet

Question

2 answers

solution1 3 ACCPTED 2021-08-19 07:23:10

solution2 2 2021-08-20 23:07:24

seaborn.catplot

solution1
3 ACCPTED 2021-08-19 07:23:10

solution2
2 2021-08-20 23:07:24

`seaborn.catplot`