简体   繁体   中英

Adding null value distribution to an otherwise normal histplot in Seaborn

I am plotting a histogram using Seaborn to display a single feature's distribution across two separate populations (ie group):

sns.histplot(df[[col_name, 'group']], x=feature, hue="group", multiple="dodge", shrink=0.7)

The problem is that the feature array contains None values which are being left out of the plot. Instead, I would like the distribution of the Null values to be displayed on the same histogram as all other numeric values in order to have a complete picture of the difference in distribution between the groups.

Is there any way to plot the Nones distribution together with numeric values in the same histplot?

Example:

tmp_df = pd.DataFrame({'feature': [10,14,231,2,5,None,3, None, None, 1,5,7], 'group':[1,1,1,1,1,1,2,2,2,2,2,2]})

sns.histplot(tmp_df[['feature', 'group']], x='feature', hue="group", multiple="dodge", shrink=.7)

There is no direct way and it's tricky as the NaNs should not be binned with the rest of the data.

One way is to set an arbitrary NaN value that is far away, not to be binned with the rest, but close enough to have a nice display. Keep in mind that this "hacking" how a histplot is working as, obviously, the NA are not part of the underlying continuous data.

pos = -20
# for an automatic positioning you could use 10% of span
# pos = -tmp_df['feature'].agg(lambda x: x.max()-x.min())/10

ax = sns.histplot(tmp_df.fillna(pos)[['feature', 'group']],
                  x='feature', hue="group", multiple="dodge", shrink=.7)

t = ax.get_xticks()
t = t[t>=0]
ax.set_xticks(np.r_[pos, t])
ax.set_xticklabels(['NA']+t.tolist())

output:

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM