简体   繁体   English

Matplotlib 直方图缺失条

[英]Matplotlib histogram missing bars

I am trying to plot a histogram using the code below:我正在尝试使用以下代码 plot 直方图:

plt.subplots(figsize = (10,6)) plt.subplots(figsize = (10,6))

lbins=[0,85,170,255,340,425] lbins=[0,85,170,255,340,425]

plt.hist(flt_data['tree_dbh'], bins=lbins) plt.hist(flt_data['tree_dbh'], bins=lbins)

plt.gca().set(title='Tree diameter histogram', ylabel='Frequency') plt.gca().set(title='树径直方图', ylabel='频率')

The output is as follows: output如下: 直方图

The output is not including all data in the histogram. output 不包括直方图中的所有数据。

The following are the descriptive statistics of the column:以下是该列的描述性统计:

描述性统计

You could set a logarithmic y-axis to better show the tiny bars.您可以设置对数 y 轴以更好地显示小条。 You can also try seaborn's sns.boxenplot(flt_data['tree_dbh']) to better visualize the distribution.您还可以尝试使用 seaborn 的sns.boxenplot(flt_data['tree_dbh'])来更好地可视化分布。

Here is an example with simulated data.这是一个模拟数据的例子。 df.describe() shows: df.describe()显示:

count    65000.000000
mean        12.591938
std         13.316495
min          0.000000
25%          2.000000
50%          9.000000
75%         18.000000
max        150.000000
Name: data, dtype: float64
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

np.random.seed(2402)
df = pd.DataFrame({'data': (np.random.normal(3, 2, 65000) ** 2).astype(int)})
df['data'].describe()

lbins = [0, 85, 170, 255, 340, 425]

fig, (ax1, ax2, ax3) = plt.subplots(ncols=3, figsize=(14, 4))

ax1.hist(df['data'], bins=lbins, fc='skyblue', ec='black')
ax1.set_title('histogram with scalar y-axis')

ax2.hist(df['data'], bins=lbins, fc='skyblue', ec='black')
ax2.set_yscale('log')
ax2.set_title('histogram with log y-axis')

sns.boxenplot(x=df['data'], color='skyblue', ax=ax3)
ax3.set_title('sns.boxenplot')

plt.tight_layout()
plt.show()

将 histplot 与 sns.boxenplot 进行比较

It looks like all your data is in the the first bar.看起来您的所有数据都在第一个栏中。 It's not that the bars or missing it's just that their values are very small compared to the first one.并不是因为条形图或缺失条形图,只是它们的值与第一个相比非常小。

You have 652173 point values and with a mean value of 11.7 and a std of 8.6.您有 652173 个分值,平均值为 11.7,标准差为 8.6。 This means that the maximum value which is 425 is most likely an outlier.这意味着最大值 425 很可能是异常值。

Try doing it with:尝试这样做:

lbins = np.arange(0,100, 10)

also you can take a look at len(flt_data['tree_dbh'][flt_data['tree_dbh'] > 85]) it will inform you how many points are counted in the other bars that you don't see你也可以看看len(flt_data['tree_dbh'][flt_data['tree_dbh'] > 85])它会告诉你在你看不到的其他条中计算了多少点

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM