Pandas plot hist sharex=False does not behave as expected

Question

I am trying to plot histograms of a couple of series from a dataframe. Series have different maximum values:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].max()

returns:

age_sent       1516.564016
last_seen       986.790035
forum_reply     137.000000
forum_cnt       155.000000
forum_exp        13.000000
forum_quest      10.000000

When I tried to plot histograms I use sharex=False, subplots=True but it looks like sharex property is ignored:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].plot.hist(figsize=(20, 10), logy=True, sharex=False, subplots=True)

I can clearly plot each of them separately, but this is less desirable. Also I would like to know what I am doing wrong.

The data I have is too big too be included, but it is easy to create something similar:

ttt = pd.DataFrame({'a': pd.Series(np.random.uniform(1, 1000, 100)), 'b': pd.Series(np.random.uniform(1, 10, 100))})

Now I have:

ttt.plot.hist(logy=True, sharex=False, subplots=True)

Check the x axis. I want it to be this way (but using one command with subplots).

ttt['a'].plot.hist(logy=True)
ttt['b'].plot.hist(logy=True)

Answer 1

The sharex (most likely) just falls through to mpl and sets if the panning / zooming one axes changes the other.

The issue you are having is that the same bins are being used for all of the histograms (which is enforced by https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L2053 if I am understanding the code correctly) because pandas assumes that if you multiple histograms then you are probably plotting columns of similar data so using the same binning makes them comparable.

Assuming you have mpl >= 1.5 and numpy >= 1.11 you should write your self a little helper function like

import matplotlib.pyplot as plt
import matplotlib as mpl 
import pandas as pd
import numpy as np

plt.ion()


def make_hists(df, fig_kwargs=None, hist_kwargs=None,
               style_cycle=None):
    '''

    Parameters
    ----------
    df : pd.DataFrame
        Datasource

    fig_kwargs : dict, optional
        kwargs to pass to `plt.subplots`

        defaults to {'fig_size': (4, 1.5*len(df.columns),
                     'tight_layout': True}

    hist_kwargs : dict, optional
        Extra kwargs to pass to `ax.hist`, defaults
        to `{'bins': 'auto'}

    style_cycle : cycler
        Style cycle to use, defaults to 
        mpl.rcParams['axes.prop_cycle']

    Returns
    -------
    fig : mpl.figure.Figure
        The figure created

    ax_list : list
        The mpl.axes.Axes objects created 

    arts : dict 
        maps column names to the histogram artist
    '''
    if style_cycle is None:
        style_cycle = mpl.rcParams['axes.prop_cycle']

    if fig_kwargs is None:
        fig_kwargs = {}
    if hist_kwargs is None:
        hist_kwargs = {}

    hist_kwargs.setdefault('log', True)
    # this requires nmupy >= 1.11
    hist_kwargs.setdefault('bins', 'auto')
    cols = df.columns

    fig_kwargs.setdefault('figsize', (4, 1.5*len(cols)))
    fig_kwargs.setdefault('tight_layout', True)
    fig, ax_lst = plt.subplots(len(cols), 1, **fig_kwargs)
    arts = {}
    for ax, col, sty in zip(ax_lst, cols, style_cycle()):
        h = ax.hist(col, data=df, **hist_kwargs, **sty)
        ax.legend()

        arts[col] = h

    return fig, list(ax_lst), arts

dist = [1, 2, 5, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

make_hists(test_df)

Answer 2

The current answer works, but there is an easier workaround in recent versions.

While df.plot.hist does not respect sharex=False , df.plot.density does.

dist = [1, 2, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

test_df.plot.density(subplots=True, sharex=False, sharey=False, layout=(-1, 2))

Pandas plot hist sharex=False does not behave as expected

Question

2 answers

solution1
4 ACCPTED 2016-09-01 15:05:06

solution2
3 2019-12-05 05:37:32

Pandas plot hist sharex=False does not behave as expected

Question

2 answers

solution1 4 ACCPTED 2016-09-01 15:05:06

solution2 3 2019-12-05 05:37:32

solution1
4 ACCPTED 2016-09-01 15:05:06

solution2
3 2019-12-05 05:37:32