简体   繁体   中英

Seaborn plot displot with hue and dual y-scale (twinx)

I am trying to plot the output from the predict of a ML model, there are the classes 1,0 for the Target, and the Score. Due the dataset is not balanced, there are few 1's.

When I plot a simple displot with the Target in the hue parameter, the plot is useless for describing the 1's

sns.set_theme()
sns.set_palette(sns.color_palette('rocket', 3))
sns.displot(df, x='Score', hue='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)
plt.show()

在此处输入图像描述

I want to change the scale for the 1's in the same plot, with a second y-scale in the right with twinx.

I have tried the following codes that may solve the problem with 2 plots, but I need only one plot . I couldn't use twinx.

g = sns.displot(df, x='Score', col='Target', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,400)
plt.show()

在此处输入图像描述

g = sns.FacetGrid(df, hue='Target')
g = g.map(sns.displot, 'Score', bins=30, linewidth=0, height=3, kde=True, aspect=1.6)

在此处输入图像描述

A reproducible example could be with the titanic dataset:

df_ = sns.load_dataset('titanic')
sns.displot(df_, x='fare', hue='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6)

在此处输入图像描述

g = sns.displot(df_, x='fare', col='survived', bins=30, linewidth=0, height=5, kde=True, aspect=1.6, facet_kws={'sharey': False, 'sharex': False})
g.axes[0,1].set_ylim(0,150)
plt.show()

在此处输入图像描述

To compare the shape of distributions with different numbers of observations, you can normalize them by setting stat="density" . By default, this normalizes each distribution using the same denominator, but you can normalize each one independently by setting common_norm=False :

sns.displot(
    titanic, x='fare', hue='survived',
    bins=30, linewidth=0, kde=True,
    stat="density", common_norm=False,
    height=5, aspect=1.6
)

在此处输入图像描述

The peak of the two distributions is not at the same y value, but that is a real feature of the data: the population of survivors is spread over a wider range of fares and is less clustered at the lower end. Having two independent y axes and scaling them to equalize the height of each distribution's peak would be misleading.

I am not sure but are you looking for this.

import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
df_ = sns.load_dataset('titanic')
sns.histplot(df_[df_['survived']==1]['fare'], bins=30, linewidth=0, kde=True, color='red')
ax2 = plt.twinx()
sns.histplot(df_[df_['survived']==0]['fare'], bins=30, linewidth=0, kde=True, ax=ax2, color='blue')

阴谋

Displot() no longer has stat and common_norm attributes (which are mentioned in mwaskom's answer), but similar outputs can be obtained using kdeplot() and histplot() functions.

In kdeplot() , the stat and kde parameters are not required as the function already calculates the density estimates:

sns.kdeplot(
data=titanic, x='fare', hue='survived',
linewidth=0,  common_norm=False, fill=True)
plt.xlim(-50,300)

kdeplot_img

The histplot() alternative.

sns.histplot(
data=titanic, x='fare', hue='survived',
linewidth=0,  common_norm=False, kde=True)
plt.xlim(0,100)

histplot_img

Worth noting that kde parameter uses kdeplot() in the background and it's possible to specify your own kde attributes through kde_kws .

Docs: kdeplot , histplot

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM