简体   繁体   English

如何在 1 个图中组合 2 个数据帧直方图?

[英]How to combine 2 dataframe histograms in 1 plot?

I would like to use a code that shows all histograms in a dataframe.我想使用一个代码来显示数据框中的所有直方图。 That will be df.hist(bins=10) .那将是df.hist(bins=10) However, I would like to add another histograms which shows CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")但是,我想添加另一个显示 CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")

I tried separating their matplotlib axes by using fig=plt.figure() and plt.subplot(211) .我尝试使用fig=plt.figure()plt.subplot(211)分离它们的 matplotlib 轴。 But this df.hist is actually part of pandas function, not matplotlib function.但是这个 df.hist 实际上是 pandas 函数的一部分,而不是 matplotlib 函数。 I also tried setting axes and adding ax=ax1 and ax2 options to each histogram but it didn't work.我还尝试设置轴并向每个直方图添加 ax=ax1 和 ax2 选项,但它没有用。

How can I combine these histograms together?如何将这些直方图组合在一起? Any help?有什么帮助吗?

Histograms that I want to combine are like these.我想要组合的直方图是这样的。 I want to show them side by side or put the second one on tip of the first one.我想并排展示它们,或者将第二个放在第一个的顶端。 Sorry that I didn't care to make them look good.对不起,我不在乎让他们看起来不错。 直方图1 直方图2

It is possible to draw them together:可以将它们画在一起:

# toy data frame
df = pd.DataFrame(np.random.normal(0,1,(100,20)))

# draw hist
fig, axes = plt.subplots(5,4, figsize=(16,10))
df.plot(kind='hist', subplots=True, ax=axes, alpha=0.5)

# clone axes so they have different scales
ax_new = [ax.twinx() for ax in axes.flatten()]
df.plot(kind='kde', ax=ax_new, subplots=True)
plt.show()

Output:输出:

在此处输入图像描述

It's also possible to draw them side-by-side.也可以并排绘制它们。 For example例如

fig, axes = plt.subplots(10,4, figsize=(16,10))
hist_axes = axes.flatten()[:20]
df.plot(kind='hist', subplots=True, ax=hist_axes, alpha=0.5)

kde_axes = axes.flatten()[20:]
df.plot(kind='kde', subplots=True, ax=kde_axes, alpha=0.5)

will plot hist on top of kde.将在 kde 之上绘制 hist。

You can find more info here: Multiple histograms in Pandas (possible duplicate btw) but apparently Pandas cannot handle multiple histogram on same graphs.您可以在此处找到更多信息: Pandas 中的多个直方图(顺便说一句,可能重复)但显然 Pandas 无法处理同一图表上的多个直方图。

It's ok because np.histogram andmatplotlib.pyplot can, check the above link for a more complete answer.没关系,因为np.histogrammatplotlib.pyplot可以,查看上面的链接以获得更完整的答案。

Solution for overlapping histograms with df.hist with any number of subplots df.hist 与任意数量的子图重叠直方图的解决方案

You can combine two dataframe histogram figures by creating twin axes using the grid of axes returned by df.hist .您可以通过使用df.hist返回的轴grid创建双轴来组合两个数据帧直方图图形。 Here is an example of normal histograms combined with cumulative step histograms where the size of the figure and the layout of the grid of subplots are taken care of automatically:下面是一个普通直方图与累积步长直方图相结合的例子,其中图形的大小和子图网格的布局是自动处理的:

import numpy as np               # v 1.19.2
import pandas as pd              # v 1.1.3
import matplotlib.pyplot as plt  # v 3.3.2

# Create sample dataset stored in a pandas dataframe
rng = np.random.default_rng(seed=1)  # random number generator
letters = [chr(i) for i in range(ord('A'), ord('G')+1)]
df = pd.DataFrame(rng.exponential(1, size=(100, len(letters))), columns=letters)

# Set parameters for figure dimensions and grid layout
nplots = df.columns.size
ncols = 3
nrows = int(np.ceil(nplots/ncols))
subp_w = 10/ncols  # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10

# Plot grid of histograms with pandas function (with a shared y-axis)
grid = df.hist(grid=False, sharey=True, figsize=(ncols*subp_w, nrows*subp_h),
               layout=(nrows, ncols), bins=bins, edgecolor='white', linewidth=0.5)

# Create list of twin axes containing second y-axis: note that due to the
# layout, the grid object may contain extra unused axes that are not shown
# (here in the H and I positions). The ax parameter of df.hist only accepts
# a number of axes that corresponds to the number of numerical variables
# in df, which is why the flattened array of grid axes is sliced here.
grid_twinx = [ax.twinx() for ax in grid.flat[:nplots]]

# Plot cumulative step histograms over normal histograms: note that the grid layout is
# preserved in grid_twinx so no need to set the layout parameter a second time here.
df.hist(ax=grid_twinx, histtype='step', bins=bins, cumulative=True, density=True, 
        color='tab:orange', linewidth=2, grid=False)

# Adjust space between subplots after generating twin axes
plt.gcf().subplots_adjust(wspace=0.4, hspace=0.4)

plt.show()

grid_hist_overlaid



Solution for displaying histograms of different types side-by-side with matplotlib使用matplotlib并排显示不同类型直方图的解决方案

To my knowledge, it is not possible to show the different types of plots side-by-side with df.hist .据我所知,不可能用df.hist并排显示不同类型的图。 You need to create the figure from scratch, like in this example using the same dataset as before:您需要从头开始创建图形,就像本例中使用与之前相同的数据集一样:

# Set parameters for figure dimensions and grid layout
nvars = df.columns.size
plot_types = 2 # normal histogram and cumulative step histogram
ncols_vars = 2
nrows = int(np.ceil(nvars/ncols_vars))
subp_w = 10/(plot_types*ncols_vars)  # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10

# Create figure with appropriate size
fig = plt.figure(figsize=(plot_types*ncols_vars*subp_w, nrows*subp_h))
fig.subplots_adjust(wspace=0.4, hspace=0.7)

# Create subplots by adding a new axes per type of plot for each variable
# and create lists of axes of normal histograms and their y-axis limits
axs_hist = []
axs_hist_ylims = []
for idx, var in enumerate(df.columns):
    axh = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+1)
    axh.hist(df[var], bins=bins, edgecolor='white', linewidth=0.5)
    axh.set_title(f'{var} - Histogram', size=11)
    axs_hist.append(axh)
    axs_hist_ylims.append(axh.get_ylim())
    axc = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+2)
    axc.hist(df[var], bins=bins, density=True, cumulative=True,
             histtype='step', color='tab:orange', linewidth=2)
    axc.set_title(f'{var} - Cumulative step hist.', size=11)

# Set shared y-axis for histograms
for ax in axs_hist:
    ax.set_ylim(max(axs_hist_ylims))

plt.show()

grid_hist_cumh

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM