简体   繁体   中英

How to combine 2 dataframe histograms in 1 plot?

I would like to use a code that shows all histograms in a dataframe. That will be df.hist(bins=10) . However, I would like to add another histograms which shows CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")

I tried separating their matplotlib axes by using fig=plt.figure() and plt.subplot(211) . But this df.hist is actually part of pandas function, not matplotlib function. I also tried setting axes and adding ax=ax1 and ax2 options to each histogram but it didn't work.

How can I combine these histograms together? Any help?

Histograms that I want to combine are like these. I want to show them side by side or put the second one on tip of the first one. Sorry that I didn't care to make them look good. 直方图1 直方图2

It is possible to draw them together:

# toy data frame
df = pd.DataFrame(np.random.normal(0,1,(100,20)))

# draw hist
fig, axes = plt.subplots(5,4, figsize=(16,10))
df.plot(kind='hist', subplots=True, ax=axes, alpha=0.5)

# clone axes so they have different scales
ax_new = [ax.twinx() for ax in axes.flatten()]
df.plot(kind='kde', ax=ax_new, subplots=True)
plt.show()

Output:

在此处输入图像描述

It's also possible to draw them side-by-side. For example

fig, axes = plt.subplots(10,4, figsize=(16,10))
hist_axes = axes.flatten()[:20]
df.plot(kind='hist', subplots=True, ax=hist_axes, alpha=0.5)

kde_axes = axes.flatten()[20:]
df.plot(kind='kde', subplots=True, ax=kde_axes, alpha=0.5)

will plot hist on top of kde.

You can find more info here: Multiple histograms in Pandas (possible duplicate btw) but apparently Pandas cannot handle multiple histogram on same graphs.

It's ok because np.histogram andmatplotlib.pyplot can, check the above link for a more complete answer.

Solution for overlapping histograms with df.hist with any number of subplots

You can combine two dataframe histogram figures by creating twin axes using the grid of axes returned by df.hist . Here is an example of normal histograms combined with cumulative step histograms where the size of the figure and the layout of the grid of subplots are taken care of automatically:

import numpy as np               # v 1.19.2
import pandas as pd              # v 1.1.3
import matplotlib.pyplot as plt  # v 3.3.2

# Create sample dataset stored in a pandas dataframe
rng = np.random.default_rng(seed=1)  # random number generator
letters = [chr(i) for i in range(ord('A'), ord('G')+1)]
df = pd.DataFrame(rng.exponential(1, size=(100, len(letters))), columns=letters)

# Set parameters for figure dimensions and grid layout
nplots = df.columns.size
ncols = 3
nrows = int(np.ceil(nplots/ncols))
subp_w = 10/ncols  # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10

# Plot grid of histograms with pandas function (with a shared y-axis)
grid = df.hist(grid=False, sharey=True, figsize=(ncols*subp_w, nrows*subp_h),
               layout=(nrows, ncols), bins=bins, edgecolor='white', linewidth=0.5)

# Create list of twin axes containing second y-axis: note that due to the
# layout, the grid object may contain extra unused axes that are not shown
# (here in the H and I positions). The ax parameter of df.hist only accepts
# a number of axes that corresponds to the number of numerical variables
# in df, which is why the flattened array of grid axes is sliced here.
grid_twinx = [ax.twinx() for ax in grid.flat[:nplots]]

# Plot cumulative step histograms over normal histograms: note that the grid layout is
# preserved in grid_twinx so no need to set the layout parameter a second time here.
df.hist(ax=grid_twinx, histtype='step', bins=bins, cumulative=True, density=True, 
        color='tab:orange', linewidth=2, grid=False)

# Adjust space between subplots after generating twin axes
plt.gcf().subplots_adjust(wspace=0.4, hspace=0.4)

plt.show()

grid_hist_overlaid



Solution for displaying histograms of different types side-by-side with matplotlib

To my knowledge, it is not possible to show the different types of plots side-by-side with df.hist . You need to create the figure from scratch, like in this example using the same dataset as before:

# Set parameters for figure dimensions and grid layout
nvars = df.columns.size
plot_types = 2 # normal histogram and cumulative step histogram
ncols_vars = 2
nrows = int(np.ceil(nvars/ncols_vars))
subp_w = 10/(plot_types*ncols_vars)  # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10

# Create figure with appropriate size
fig = plt.figure(figsize=(plot_types*ncols_vars*subp_w, nrows*subp_h))
fig.subplots_adjust(wspace=0.4, hspace=0.7)

# Create subplots by adding a new axes per type of plot for each variable
# and create lists of axes of normal histograms and their y-axis limits
axs_hist = []
axs_hist_ylims = []
for idx, var in enumerate(df.columns):
    axh = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+1)
    axh.hist(df[var], bins=bins, edgecolor='white', linewidth=0.5)
    axh.set_title(f'{var} - Histogram', size=11)
    axs_hist.append(axh)
    axs_hist_ylims.append(axh.get_ylim())
    axc = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+2)
    axc.hist(df[var], bins=bins, density=True, cumulative=True,
             histtype='step', color='tab:orange', linewidth=2)
    axc.set_title(f'{var} - Cumulative step hist.', size=11)

# Set shared y-axis for histograms
for ax in axs_hist:
    ax.set_ylim(max(axs_hist_ylims))

plt.show()

grid_hist_cumh

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM