简体   繁体   中英

How to center the histogram bars around tick marks using seaborn displot? Stacking bars is essential

I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.

Even after setting the tick values, I am not able to get the bars to center around the tick marks.

Example code

# Center the histogram on the tick marks 
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
                hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))


I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)

I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny. I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed. Any solutions?

tips['single'] = 0.5
sns.displot(x='single',
                hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))

Edit: Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?

Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.

For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.

tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins

sns.histplot(x="total_bill",
                hue="day", multiple = 'stack', data=tips,
                bins=n_bins, binrange=(min_val, max_val),
                palette='Paired')
plt.xlim(0, 55) # Define x-axis limits

Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.

Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like

plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))

有界条

Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.

plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))

居中刻度 - 配对

For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.

single_val = 23.5
tips['single'] = single_val
bin_width = 4

fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots 

# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
                hue="day", multiple = 'stack', data=tips, 
                binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
                palette='rocket',
                ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
    title='Given value as tick-label starts the bin on x-axis',
    xticks=ticks,
    xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))

# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
                hue="day", multiple = 'stack', data=tips, 
                binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
                palette='rocket',
                ax=axs[1])

ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
    title='Bin centre is offset from single_value by bin_width/2',
    xticks=ticks,
    xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))

Output:

单值图表

I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here . For our example, one way to do this is as follows:

n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
    val_count=('single', 'count'),
    val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
    stacked=True,
    color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
    width=0.05 # Set bar width
    ) 
plt.show()

This yields:

熊猫分类图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM