簡體   English   中英

如何在 1 個圖中組合 2 個數據幀直方圖?

[英]How to combine 2 dataframe histograms in 1 plot?

我想使用一個代碼來顯示數據框中的所有直方圖。 那將是df.hist(bins=10) 但是,我想添加另一個顯示 CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")

我嘗試使用fig=plt.figure()plt.subplot(211)分離它們的 matplotlib 軸。 但是這個 df.hist 實際上是 pandas 函數的一部分,而不是 matplotlib 函數。 我還嘗試設置軸並向每個直方圖添加 ax=ax1 和 ax2 選項,但它沒有用。

如何將這些直方圖組合在一起? 有什么幫助嗎?

我想要組合的直方圖是這樣的。 我想並排展示它們,或者將第二個放在第一個的頂端。 對不起,我不在乎讓他們看起來不錯。 直方圖1 直方圖2

可以將它們畫在一起:

# toy data frame
df = pd.DataFrame(np.random.normal(0,1,(100,20)))

# draw hist
fig, axes = plt.subplots(5,4, figsize=(16,10))
df.plot(kind='hist', subplots=True, ax=axes, alpha=0.5)

# clone axes so they have different scales
ax_new = [ax.twinx() for ax in axes.flatten()]
df.plot(kind='kde', ax=ax_new, subplots=True)
plt.show()

輸出:

在此處輸入圖像描述

也可以並排繪制它們。 例如

fig, axes = plt.subplots(10,4, figsize=(16,10))
hist_axes = axes.flatten()[:20]
df.plot(kind='hist', subplots=True, ax=hist_axes, alpha=0.5)

kde_axes = axes.flatten()[20:]
df.plot(kind='kde', subplots=True, ax=kde_axes, alpha=0.5)

將在 kde 之上繪制 hist。

您可以在此處找到更多信息: Pandas 中的多個直方圖(順便說一句,可能重復)但顯然 Pandas 無法處理同一圖表上的多個直方圖。

沒關系,因為np.histogrammatplotlib.pyplot可以,查看上面的鏈接以獲得更完整的答案。

df.hist 與任意數量的子圖重疊直方圖的解決方案

您可以通過使用df.hist返回的軸grid創建雙軸來組合兩個數據幀直方圖圖形。 下面是一個普通直方圖與累積步長直方圖相結合的例子,其中圖形的大小和子圖網格的布局是自動處理的:

import numpy as np               # v 1.19.2
import pandas as pd              # v 1.1.3
import matplotlib.pyplot as plt  # v 3.3.2

# Create sample dataset stored in a pandas dataframe
rng = np.random.default_rng(seed=1)  # random number generator
letters = [chr(i) for i in range(ord('A'), ord('G')+1)]
df = pd.DataFrame(rng.exponential(1, size=(100, len(letters))), columns=letters)

# Set parameters for figure dimensions and grid layout
nplots = df.columns.size
ncols = 3
nrows = int(np.ceil(nplots/ncols))
subp_w = 10/ncols  # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10

# Plot grid of histograms with pandas function (with a shared y-axis)
grid = df.hist(grid=False, sharey=True, figsize=(ncols*subp_w, nrows*subp_h),
               layout=(nrows, ncols), bins=bins, edgecolor='white', linewidth=0.5)

# Create list of twin axes containing second y-axis: note that due to the
# layout, the grid object may contain extra unused axes that are not shown
# (here in the H and I positions). The ax parameter of df.hist only accepts
# a number of axes that corresponds to the number of numerical variables
# in df, which is why the flattened array of grid axes is sliced here.
grid_twinx = [ax.twinx() for ax in grid.flat[:nplots]]

# Plot cumulative step histograms over normal histograms: note that the grid layout is
# preserved in grid_twinx so no need to set the layout parameter a second time here.
df.hist(ax=grid_twinx, histtype='step', bins=bins, cumulative=True, density=True, 
        color='tab:orange', linewidth=2, grid=False)

# Adjust space between subplots after generating twin axes
plt.gcf().subplots_adjust(wspace=0.4, hspace=0.4)

plt.show()

grid_hist_overlaid



使用matplotlib並排顯示不同類型直方圖的解決方案

據我所知,不可能用df.hist並排顯示不同類型的圖。 您需要從頭開始創建圖形,就像本例中使用與之前相同的數據集一樣:

# Set parameters for figure dimensions and grid layout
nvars = df.columns.size
plot_types = 2 # normal histogram and cumulative step histogram
ncols_vars = 2
nrows = int(np.ceil(nvars/ncols_vars))
subp_w = 10/(plot_types*ncols_vars)  # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10

# Create figure with appropriate size
fig = plt.figure(figsize=(plot_types*ncols_vars*subp_w, nrows*subp_h))
fig.subplots_adjust(wspace=0.4, hspace=0.7)

# Create subplots by adding a new axes per type of plot for each variable
# and create lists of axes of normal histograms and their y-axis limits
axs_hist = []
axs_hist_ylims = []
for idx, var in enumerate(df.columns):
    axh = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+1)
    axh.hist(df[var], bins=bins, edgecolor='white', linewidth=0.5)
    axh.set_title(f'{var} - Histogram', size=11)
    axs_hist.append(axh)
    axs_hist_ylims.append(axh.get_ylim())
    axc = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+2)
    axc.hist(df[var], bins=bins, density=True, cumulative=True,
             histtype='step', color='tab:orange', linewidth=2)
    axc.set_title(f'{var} - Cumulative step hist.', size=11)

# Set shared y-axis for histograms
for ax in axs_hist:
    ax.set_ylim(max(axs_hist_ylims))

plt.show()

grid_hist_cumh

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM