简体   繁体   中英

Plotting: qcut then groupby two variables

I have the following dataset:

df = pd.DataFrame({'cls': [1,2,2,1,2,1,2,1,2,1,2],
                   'x': [10,11,21,21,8,1,4,3,5,6,2],
                   'y': [10,1,2,2,5,2,4,3,8,6,5]})

df['bin'] = pd.qcut(np.array(df['x']), 4)
a = df.groupby(['bin', 'cls'])['y'].mean()
a

This gives me

bin           cls
(0.999, 3.5]  1       2.5
              2       5.0
(3.5, 6.0]    1       6.0
              2       6.0
(6.0, 10.5]   1      10.0
              2       5.0
(10.5, 21.0]  1       2.0
              2       1.5
Name: y, dtype: float64

I want to plot the right-most column (that is, the average of y per cls per bin) per bin per class. That is, for each bin we have two values of y that I would like to plot as points/scatters. Is that possible using matplotlib or seaborn?

You can indeed use seaborn for what you're asking. Does this work?

# import libraries
import matplotlib.pyplot as plt
import seaborn as sns

# set up some plotting options
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(1,1,1)

# we reset index to avoid having to do multi-indexing
a = a.reset_index()

# use seaborn with argument 'hue' to do the grouping
sns.barplot(x="bin", y="y", hue="cls", data=a, ax=ax)
plt.show() 

在此处输入图像描述

EDIT: I've just noticed that you wanted to plot "points". I wouldn't advise it for this dataset but you can do that if you replace barplot with catplot .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM