在 pandas 中使用 Dataframe.plot 时如何更改每个子图的 bin 大小

Question

我有一个 DataFrame 包含所有数字列，其中列之间的数据范围差异很大。 下面的代码提供了一个有代表性的例子：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'A': np.random.randn(10000) * 20,
    'B': np.random.randn(10000) * 1000,
    'C': np.random.randn(10000) * 0.01,
    'D': np.random.randn(10000) * 300000,
    'E': np.random.randn(10000) * 500
})

axs = df.plot(kind = 'hist',subplots = True, bins = 10, layout = (2,3), figsize = (12,8), title = list(df.columns), sharex = False, sharey = True)

for i, ax in enumerate(axs.reshape(-1)):
    if i>= len(df.columns):
        break
    ax.set_xlim(df[df.columns[i]].min(),df[df.columns[i]].max())
    
plt.suptitle('Histograms for all features')
plt.tight_layout()
plt.show()

当调用df.plot时，xlim 范围自动设置为具有最大数字的列的范围，这就是为什么我添加了for循环来解决这个问题。

但是，正如您在下面的屏幕截图中看到的那样，bin 没有正确缩放。

带有错误 bin 的直方图

我希望每个子图都显示 10 个 bin，每个 bin 的宽度适合每个直方图。 有没有办法做到这一点，无论是调用df.plot还是使用某种方法访问 Axes 对象？

Answer 1

您可以改用 pandas hist function 。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({
    'A': np.random.randn(10000) * 20,
    'B': np.random.randn(10000) * 1000,
    'C': np.random.randn(10000) * 0.01,
    'D': np.random.randn(10000) * 300000,
    'E': np.random.randn(10000) * 500
})

df.describe()

plt.figure();
df.hist(bins = 10,layout = (2,3),density = True, figsize = (12,8), sharex = False, sharey = False
);

在 pandas 中使用 Dataframe.plot 时如何更改每个子图的 bin 大小

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-04-23 10:48:10

在 pandas 中使用 Dataframe.plot 时如何更改每个子图的 bin 大小

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-04-23 10:48:10

解决方案1
2 已采纳 2021-04-23 10:48:10