I have defined a function for me to analyse my columns with boxplots.
fig, ax = plt.subplots((len(list_of_columns)),1,figsize= datafigsize)
fig.suptitle(suptitle,fontsize=30)
ax = ax.ravel() # Ravel turns a matrix into a vector, which is easier to iterate
plt.tight_layout(h_pad = 3,pad=10);
for i, column in enumerate(list_of_columns):
nobs = dataframe[column].value_counts().values
nobs = [str(y) for y in nobs.tolist()]
nobs = ["n: " + j for j in nobs]
pos = range(len(nobs))
medians = dataframe.groupby([column])['saleprice'].median().values
for tick,label in zip(pos,ax[i].get_xticklabels()):
ax[i].text(pos[tick], medians[tick] + 0.03, nobs[tick],
horizontalalignment='center', size='small', color='k', weight='semibold')
sns.boxplot(data = dataframe,
x= dataframe[column],
y='saleprice',
ax=ax[i])
ax[i].set_title(list_of_titles[i],fontdict={'fontsize': 15})
ax[i].xaxis.set_visible(True);
Subplot works fine. My numbers of observations are plotted as well.
However, the number of observations can only be plotted on 6 categories. Here is an example:
Only shows n = # for 6 categories. Only shows n = # for 6 categories.
Most likely you have some other objects in the environment which is causing the trouble. Also you placed the sns.boxplot inside the wrong for loop.
If I set up using an example dataset:
import pandas as pd
import seaborn as sns
import numpy as np
import string
import matplotlib.pyplot as plt
Vars = [i for i in string.ascii_letters]
np.random.seed(111)
dataframe = pd.DataFrame({'saleprice':np.random.uniform(0,100,100),
'var1':np.random.choice(Vars[0:5],100),
'var2':np.random.choice(Vars[5:12],100),
'var3':np.random.choice(Vars[12:21],100)})
list_of_columns = ['var1','var2','var3']
You can see below, I modified the script slightly, calculating the median and number of observations inside a data.frame. Also make sure that the plotted order and the order of your counts are the same (I used the index of the groupby dataframe as a reference below):
for i, column in enumerate(list_of_columns):
stats_df = dataframe.groupby(column)['saleprice'].agg(median=np.median,n=len)
stats_df = stats_df.sort_values('median')
sns.boxplot(data = dataframe, x= column,y='saleprice',ax=ax[i],order=stats_df.index)
ax[i].set_title(list_of_columns[i],fontdict={'fontsize': 15})
for xpos in range(len(stats_df)):
label = "n= "+str(stats_df['n'][xpos])
ypos = stats_df['median'][xpos] + 0.03
ax[i].text(xpos,ypos,label,horizontalalignment='center', size='small')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.