简体   繁体   中英

Plotting number of observations on seaborn boxplot for subplots (python)

I have defined a function for me to analyse my columns with boxplots.

    fig, ax = plt.subplots((len(list_of_columns)),1,figsize= datafigsize) 
    fig.suptitle(suptitle,fontsize=30)
    ax = ax.ravel() # Ravel turns a matrix into a vector, which is easier to iterate
    plt.tight_layout(h_pad = 3,pad=10);
    
    for i, column in enumerate(list_of_columns): 
        nobs = dataframe[column].value_counts().values
        nobs = [str(y) for y in nobs.tolist()]
        nobs = ["n: " + j for j in nobs]   
        pos = range(len(nobs))
        medians = dataframe.groupby([column])['saleprice'].median().values
        for tick,label in zip(pos,ax[i].get_xticklabels()):                                   
            ax[i].text(pos[tick], medians[tick] + 0.03, nobs[tick],
                    horizontalalignment='center', size='small', color='k', weight='semibold')
            sns.boxplot(data = dataframe, 
                        x= dataframe[column], 
                        y='saleprice',
                        ax=ax[i]) 
            ax[i].set_title(list_of_titles[i],fontdict={'fontsize': 15})
            ax[i].xaxis.set_visible(True);

Subplot works fine. My numbers of observations are plotted as well.

However, the number of observations can only be plotted on 6 categories. Here is an example:

Only shows n = # for 6 categories. Only shows n = # for 6 categories.

Most likely you have some other objects in the environment which is causing the trouble. Also you placed the sns.boxplot inside the wrong for loop.

If I set up using an example dataset:

import pandas as pd
import seaborn as sns
import numpy as np
import string
import matplotlib.pyplot as plt

Vars = [i for i in string.ascii_letters]
np.random.seed(111)
dataframe = pd.DataFrame({'saleprice':np.random.uniform(0,100,100),
                          'var1':np.random.choice(Vars[0:5],100),
                          'var2':np.random.choice(Vars[5:12],100),
                         'var3':np.random.choice(Vars[12:21],100)})

list_of_columns = ['var1','var2','var3']

You can see below, I modified the script slightly, calculating the median and number of observations inside a data.frame. Also make sure that the plotted order and the order of your counts are the same (I used the index of the groupby dataframe as a reference below):

for i, column in enumerate(list_of_columns): 
    stats_df = dataframe.groupby(column)['saleprice'].agg(median=np.median,n=len)
    stats_df = stats_df.sort_values('median')
    sns.boxplot(data = dataframe, x= column,y='saleprice',ax=ax[i],order=stats_df.index)
    ax[i].set_title(list_of_columns[i],fontdict={'fontsize': 15})
    
    for xpos in range(len(stats_df)):
        label = "n= "+str(stats_df['n'][xpos])
        ypos = stats_df['median'][xpos] + 0.03
        ax[i].text(xpos,ypos,label,horizontalalignment='center', size='small')

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM