简体   繁体   中英

Add more descriptive labelling to x-axis of Matplotlib histogram in Python

I have created a histogram in a Jupyter notebook to show the distribution of time on page in seconds for 100 web visits.

Code as follows:

ax = df.hist(column='time_on_page', bins=25, grid=False, figsize=(12,8), color='#86bf91', zorder=2, rwidth=0.9)

ax = ax[0]
for x in ax:

    # Despine
    x.spines['right'].set_visible(False)
    x.spines['top'].set_visible(False)
    x.spines['left'].set_visible(False)

    # Switch off ticks
    x.tick_params(axis="both", which="both", bottom="off", top="off", labelbottom="on", left="off", right="off", labelleft="on")

    
    # Draw horizontal axis lines
    vals = x.get_yticks()
    for tick in vals:
        x.axhline(y=tick, linestyle='dashed', alpha=0.4, color='#eeeeee', zorder=1)

    # Set title
    x.set_title("Time on Page Histogram", fontsize=20, weight='bold', size=12)

    # Set x-axis label
    x.set_xlabel("Time on Page Duration (Seconds)", labelpad=20, weight='bold', size=12)

    # Set y-axis label
    x.set_ylabel("Page Views", labelpad=20, weight='bold', size=12)

    # Format y-axis label
    x.yaxis.set_major_formatter(StrMethodFormatter('{x:,g}'))

This produces the following visualisation:

在此处输入图像描述

I'm generally happy with the appearance however I'd like for the axis to be a little more descriptive, perhaps showing the bin range for each bin and the percentage of the total that each bin constitutes.

Have looked for this in the Matplotlib documentation but cannot seem ot find anything that would allow me to achieve my end goal.

Any help greatly appreciated.

When you set bins=25 , 25 equally spaced bins are set between the lowest and highest values encountered. If you use these ranges to mark the bins, things can be confusing due to the arbitrary values. It seems more adequate to round these bin boundaries, for example to multiples of 20. Then, these values can be used as tick marks on the x-axis, nicely between the bins.

The percentages can be added by looping through the bars (rectangular patches). Their height indicates the number of rows belonging to the bin, so dividing by the total number of rows and multiplying by 100 gives a percentage. The bar height, x and half width can position the text.

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

df = pd.DataFrame({'time_on_page': np.random.lognormal(4, 1.1, 100)})
max_x = df['time_on_page'].max()
bin_width = max(20, np.round(max_x / 25 / 20) * 20) # round to multiple of 20, use max(20, ...) to avoid rounding to zero
bins = np.arange(0, max_x + bin_width, bin_width)
axes = df.hist(column='time_on_page', bins=bins, grid=False, figsize=(12, 8), color='#86bf91', rwidth=0.9)
ax = axes[0, 0]
total = len(df)
ax.set_xticks(bins)
for p in ax.patches:
    h = p.get_height()
    if h > 0:
        ax.text(p.get_x() + p.get_width() / 2, h, f'{h / total * 100.0  :.0f} %\n', ha='center', va='center')
ax.grid(True, axis='y', ls=':', alpha=0.4)
ax.set_axisbelow(True)
for dir in ['left', 'right', 'top']:
    ax.spines[dir].set_visible(False)
ax.tick_params(axis="y", length=0)  # Switch off y ticks
ax.margins(x=0.02) # tighter x margins
plt.show()

示例图

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM