简体   繁体   English

Pandas plot hist sharex=False 行为不符合预期

[英]Pandas plot hist sharex=False does not behave as expected

I am trying to plot histograms of a couple of series from a dataframe.我正在尝试从数据框中绘制几个系列的直方图。 Series have different maximum values:系列有不同的最大值:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].max()

returns:返回:

age_sent       1516.564016
last_seen       986.790035
forum_reply     137.000000
forum_cnt       155.000000
forum_exp        13.000000
forum_quest      10.000000

When I tried to plot histograms I use sharex=False, subplots=True but it looks like sharex property is ignored:当我尝试绘制直方图时,我使用sharex=False, subplots=True但看起来sharex属性被忽略了:

df[[
    'age_sent', 'last_seen', 'forum_reply', 'forum_cnt', 'forum_exp', 'forum_quest'
]].plot.hist(figsize=(20, 10), logy=True, sharex=False, subplots=True)

在此处输入图片说明


I can clearly plot each of them separately, but this is less desirable.我可以清楚地分别绘制它们中的每一个,但这不太理想。 Also I would like to know what I am doing wrong.我也想知道我做错了什么。


The data I have is too big too be included, but it is easy to create something similar:我拥有的数据太大也无法包含在内,但很容易创建类似的内容:

ttt = pd.DataFrame({'a': pd.Series(np.random.uniform(1, 1000, 100)), 'b': pd.Series(np.random.uniform(1, 10, 100))})

Now I have:现在我有:

ttt.plot.hist(logy=True, sharex=False, subplots=True)

Check the x axis.检查 x 轴。 I want it to be this way (but using one command with subplots).我希望它是这种方式(但使用一个带有子图的命令)。

ttt['a'].plot.hist(logy=True)
ttt['b'].plot.hist(logy=True)

The sharex (most likely) just falls through to mpl and sets if the panning / zooming one axes changes the other. sharex (最有可能)只是落入 mpl 并设置是否平移/缩放一个轴改变另一个。

The issue you are having is that the same bins are being used for all of the histograms (which is enforced by https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L2053 if I am understanding the code correctly) because pandas assumes that if you multiple histograms then you are probably plotting columns of similar data so using the same binning makes them comparable.您遇到的问题是所有直方图都使用相同的 bins(这是由https://github.com/pydata/pandas/blob/master/pandas/tools/plotting.py#L2053强制执行的,如果我我正确理解代码)因为 Pandas 假设如果您有多个直方图,那么您可能正在绘制相似数据的列,因此使用相同的分箱使它们具有可比性。

Assuming you have mpl >= 1.5 and numpy >= 1.11 you should write your self a little helper function like假设你有 mpl >= 1.5 和 numpy >= 1.11 你应该给自己写一个像

import matplotlib.pyplot as plt
import matplotlib as mpl 
import pandas as pd
import numpy as np

plt.ion()


def make_hists(df, fig_kwargs=None, hist_kwargs=None,
               style_cycle=None):
    '''

    Parameters
    ----------
    df : pd.DataFrame
        Datasource

    fig_kwargs : dict, optional
        kwargs to pass to `plt.subplots`

        defaults to {'fig_size': (4, 1.5*len(df.columns),
                     'tight_layout': True}

    hist_kwargs : dict, optional
        Extra kwargs to pass to `ax.hist`, defaults
        to `{'bins': 'auto'}

    style_cycle : cycler
        Style cycle to use, defaults to 
        mpl.rcParams['axes.prop_cycle']

    Returns
    -------
    fig : mpl.figure.Figure
        The figure created

    ax_list : list
        The mpl.axes.Axes objects created 

    arts : dict 
        maps column names to the histogram artist
    '''
    if style_cycle is None:
        style_cycle = mpl.rcParams['axes.prop_cycle']

    if fig_kwargs is None:
        fig_kwargs = {}
    if hist_kwargs is None:
        hist_kwargs = {}

    hist_kwargs.setdefault('log', True)
    # this requires nmupy >= 1.11
    hist_kwargs.setdefault('bins', 'auto')
    cols = df.columns

    fig_kwargs.setdefault('figsize', (4, 1.5*len(cols)))
    fig_kwargs.setdefault('tight_layout', True)
    fig, ax_lst = plt.subplots(len(cols), 1, **fig_kwargs)
    arts = {}
    for ax, col, sty in zip(ax_lst, cols, style_cycle()):
        h = ax.hist(col, data=df, **hist_kwargs, **sty)
        ax.legend()

        arts[col] = h

    return fig, list(ax_lst), arts

dist = [1, 2, 5, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

make_hists(test_df)

在此处输入图片说明

The current answer works, but there is an easier workaround in recent versions.当前的答案有效,但在最近的版本中有更简单的解决方法。

While df.plot.hist does not respect sharex=False , df.plot.density does.虽然df.plot.hist不尊重sharex=False ,但df.plot.density确实如此。

dist = [1, 2, 7, 50]
col_names = ['weibull $a={}$'.format(alpha) for alpha in dist]
test_df = pd.DataFrame(np.random.weibull(dist,
                                         (10000, len(dist))),
                       columns=col_names)

test_df.plot.density(subplots=True, sharex=False, sharey=False, layout=(-1, 2))

密度图尊重 sharex

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM