[英]How to combine 2 dataframe histograms in 1 plot?
我想使用一個代碼來顯示數據框中的所有直方圖。 那將是df.hist(bins=10)
。 但是,我想添加另一個顯示 CDF df_hist=df.hist(cumulative=True,bins=100,density=1,histtype="step")
我嘗試使用fig=plt.figure()
和plt.subplot(211)
分離它們的 matplotlib 軸。 但是這個 df.hist 實際上是 pandas 函數的一部分,而不是 matplotlib 函數。 我還嘗試設置軸並向每個直方圖添加 ax=ax1 和 ax2 選項,但它沒有用。
如何將這些直方圖組合在一起? 有什么幫助嗎?
可以將它們畫在一起:
# toy data frame
df = pd.DataFrame(np.random.normal(0,1,(100,20)))
# draw hist
fig, axes = plt.subplots(5,4, figsize=(16,10))
df.plot(kind='hist', subplots=True, ax=axes, alpha=0.5)
# clone axes so they have different scales
ax_new = [ax.twinx() for ax in axes.flatten()]
df.plot(kind='kde', ax=ax_new, subplots=True)
plt.show()
輸出:
也可以並排繪制它們。 例如
fig, axes = plt.subplots(10,4, figsize=(16,10))
hist_axes = axes.flatten()[:20]
df.plot(kind='hist', subplots=True, ax=hist_axes, alpha=0.5)
kde_axes = axes.flatten()[20:]
df.plot(kind='kde', subplots=True, ax=kde_axes, alpha=0.5)
將在 kde 之上繪制 hist。
您可以在此處找到更多信息: Pandas 中的多個直方圖(順便說一句,可能重復)但顯然 Pandas 無法處理同一圖表上的多個直方圖。
沒關系,因為np.histogram
和matplotlib.pyplot
可以,查看上面的鏈接以獲得更完整的答案。
df.hist 與任意數量的子圖重疊直方圖的解決方案
您可以通過使用df.hist
返回的軸grid
創建雙軸來組合兩個數據幀直方圖圖形。 下面是一個普通直方圖與累積步長直方圖相結合的例子,其中圖形的大小和子圖網格的布局是自動處理的:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset stored in a pandas dataframe
rng = np.random.default_rng(seed=1) # random number generator
letters = [chr(i) for i in range(ord('A'), ord('G')+1)]
df = pd.DataFrame(rng.exponential(1, size=(100, len(letters))), columns=letters)
# Set parameters for figure dimensions and grid layout
nplots = df.columns.size
ncols = 3
nrows = int(np.ceil(nplots/ncols))
subp_w = 10/ncols # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Plot grid of histograms with pandas function (with a shared y-axis)
grid = df.hist(grid=False, sharey=True, figsize=(ncols*subp_w, nrows*subp_h),
layout=(nrows, ncols), bins=bins, edgecolor='white', linewidth=0.5)
# Create list of twin axes containing second y-axis: note that due to the
# layout, the grid object may contain extra unused axes that are not shown
# (here in the H and I positions). The ax parameter of df.hist only accepts
# a number of axes that corresponds to the number of numerical variables
# in df, which is why the flattened array of grid axes is sliced here.
grid_twinx = [ax.twinx() for ax in grid.flat[:nplots]]
# Plot cumulative step histograms over normal histograms: note that the grid layout is
# preserved in grid_twinx so no need to set the layout parameter a second time here.
df.hist(ax=grid_twinx, histtype='step', bins=bins, cumulative=True, density=True,
color='tab:orange', linewidth=2, grid=False)
# Adjust space between subplots after generating twin axes
plt.gcf().subplots_adjust(wspace=0.4, hspace=0.4)
plt.show()
使用matplotlib並排顯示不同類型直方圖的解決方案
據我所知,不可能用df.hist
並排顯示不同類型的圖。 您需要從頭開始創建圖形,就像本例中使用與之前相同的數據集一樣:
# Set parameters for figure dimensions and grid layout
nvars = df.columns.size
plot_types = 2 # normal histogram and cumulative step histogram
ncols_vars = 2
nrows = int(np.ceil(nvars/ncols_vars))
subp_w = 10/(plot_types*ncols_vars) # 10 is the total figure width in inches
subp_h = 0.75*subp_w
bins = 10
# Create figure with appropriate size
fig = plt.figure(figsize=(plot_types*ncols_vars*subp_w, nrows*subp_h))
fig.subplots_adjust(wspace=0.4, hspace=0.7)
# Create subplots by adding a new axes per type of plot for each variable
# and create lists of axes of normal histograms and their y-axis limits
axs_hist = []
axs_hist_ylims = []
for idx, var in enumerate(df.columns):
axh = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+1)
axh.hist(df[var], bins=bins, edgecolor='white', linewidth=0.5)
axh.set_title(f'{var} - Histogram', size=11)
axs_hist.append(axh)
axs_hist_ylims.append(axh.get_ylim())
axc = fig.add_subplot(nrows, plot_types*ncols_vars, idx*plot_types+2)
axc.hist(df[var], bins=bins, density=True, cumulative=True,
histtype='step', color='tab:orange', linewidth=2)
axc.set_title(f'{var} - Cumulative step hist.', size=11)
# Set shared y-axis for histograms
for ax in axs_hist:
ax.set_ylim(max(axs_hist_ylims))
plt.show()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.