简体   繁体   中英

Pandas plot hist sharex=False does not behave as expected

I am trying to plot histograms of a couple of series from a dataframe. Series have different maximum values:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].max()

returns:

age_sent       1516.564016
last_seen       986.790035
forum_reply     137.000000
forum_cnt       155.000000
forum_exp        13.000000
forum_quest      10.000000

When I tried to plot histograms I use sharex=False, subplots=True but it looks like sharex property is ignored:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].plot.hist(figsize=(20, 10), logy=True, sharex=False, subplots=True)

在此处输入图片说明


I can clearly plot each of them separately, but this is less desirable. Also I would like to know what I am doing wrong.


The data I have is too big too be included, but it is easy to create something similar:

ttt = pd.DataFrame({'a': pd.Series(np.random.uniform(1, 1000, 100)), 'b': pd.Series(np.random.uniform(1, 10, 100))})

Now I have:

ttt.plot.hist(logy=True, sharex=False, subplots=True)

Check the x axis. I want it to be this way (but using one command with subplots).

ttt['a'].plot.hist(logy=True)
ttt['b'].plot.hist(logy=True)

The sharex (most likely) just falls through to mpl and sets if the panning / zooming one axes changes the other.

The issue you are having is that the same bins are being used for all of the histograms (which is enforced by https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L2053 if I am understanding the code correctly) because pandas assumes that if you multiple histograms then you are probably plotting columns of similar data so using the same binning makes them comparable.

Assuming you have mpl >= 1.5 and numpy >= 1.11 you should write your self a little helper function like

import matplotlib.pyplot as plt
import matplotlib as mpl 
import pandas as pd
import numpy as np

plt.ion()


def make_hists(df, fig_kwargs=None, hist_kwargs=None,
               style_cycle=None):
    '''

    Parameters
    ----------
    df : pd.DataFrame
        Datasource

    fig_kwargs : dict, optional
        kwargs to pass to `plt.subplots`

        defaults to {'fig_size': (4, 1.5*len(df.columns),
                     'tight_layout': True}

    hist_kwargs : dict, optional
        Extra kwargs to pass to `ax.hist`, defaults
        to `{'bins': 'auto'}

    style_cycle : cycler
        Style cycle to use, defaults to 
        mpl.rcParams['axes.prop_cycle']

    Returns
    -------
    fig : mpl.figure.Figure
        The figure created

    ax_list : list
        The mpl.axes.Axes objects created 

    arts : dict 
        maps column names to the histogram artist
    '''
    if style_cycle is None:
        style_cycle = mpl.rcParams['axes.prop_cycle']

    if fig_kwargs is None:
        fig_kwargs = {}
    if hist_kwargs is None:
        hist_kwargs = {}

    hist_kwargs.setdefault('log', True)
    # this requires nmupy >= 1.11
    hist_kwargs.setdefault('bins', 'auto')
    cols = df.columns

    fig_kwargs.setdefault('figsize', (4, 1.5*len(cols)))
    fig_kwargs.setdefault('tight_layout', True)
    fig, ax_lst = plt.subplots(len(cols), 1, **fig_kwargs)
    arts = {}
    for ax, col, sty in zip(ax_lst, cols, style_cycle()):
        h = ax.hist(col, data=df, **hist_kwargs, **sty)
        ax.legend()

        arts[col] = h

    return fig, list(ax_lst), arts

dist = [1, 2, 5, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

make_hists(test_df)

在此处输入图片说明

The current answer works, but there is an easier workaround in recent versions.

While df.plot.hist does not respect sharex=False , df.plot.density does.

dist = [1, 2, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

test_df.plot.density(subplots=True, sharex=False, sharey=False, layout=(-1, 2))

密度图尊重 sharex

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM