Plotting: qcut then groupby two variables

Question

I have the following dataset:

df = pd.DataFrame({'cls': [1,2,2,1,2,1,2,1,2,1,2],
                   'x': [10,11,21,21,8,1,4,3,5,6,2],
                   'y': [10,1,2,2,5,2,4,3,8,6,5]})

df['bin'] = pd.qcut(np.array(df['x']), 4)
a = df.groupby(['bin', 'cls'])['y'].mean()
a

This gives me

bin           cls
(0.999, 3.5]  1       2.5
              2       5.0
(3.5, 6.0]    1       6.0
              2       6.0
(6.0, 10.5]   1      10.0
              2       5.0
(10.5, 21.0]  1       2.0
              2       1.5
Name: y, dtype: float64

I want to plot the right-most column (that is, the average of y per cls per bin) per bin per class. That is, for each bin we have two values of y that I would like to plot as points/scatters. Is that possible using matplotlib or seaborn?

Answer 1

You can indeed use seaborn for what you're asking. Does this work?

# import libraries
import matplotlib.pyplot as plt
import seaborn as sns

# set up some plotting options
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(1,1,1)

# we reset index to avoid having to do multi-indexing
a = a.reset_index()

# use seaborn with argument 'hue' to do the grouping
sns.barplot(x="bin", y="y", hue="cls", data=a, ax=ax)
plt.show()

EDIT: I've just noticed that you wanted to plot "points". I wouldn't advise it for this dataset but you can do that if you replace barplot with catplot .

Plotting: qcut then groupby two variables

Question

1 answers

solution1
0 2021-03-15 14:10:51

Plotting: qcut then groupby two variables

Question

1 answers

solution1 0 2021-03-15 14:10:51

solution1
0 2021-03-15 14:10:51