简体   繁体   English

如何在 python 中的 matplotlib 中 plot 直方图?

[英]How to plot a histogram in matplotlib in python?

I know how to plot a histogram when individual datapoints are given like: (33, 45, 54, 33, 21, 29, 15, ...)我知道如何 plot 当单个数据点给出如下直方图:(33, 45, 54, 33, 21, 29, 15, ...)

by simply using something matplotlib.pyplot.hist(x, bins=10)通过简单地使用matplotlib.pyplot.hist(x, bins=10)

but what if I only have grouped data like:但是如果我只有分组数据,例如:

|分数 |学生人数 | | -------- | ------------------ | | 0-10 | 8 | | 10-20 | 12 | | 20-30 | 24 | | 30-40 | 26 | | ...... | ...... | and so on.等等。

I know that I can use bar plots to mimic a histogram by changing xticks but what if I want to do this by using only hist function of matplotlib.pyplot ?我知道我可以通过更改xticks使用条形图来模拟直方图,但是如果我只想通过使用hist function 的matplotlib.pyplot来做到这一点怎么办?

Is it possible to do this?是否有可能做到这一点?

You can build the hist() params manually and use the existing value counts as weights .您可以手动构建hist()参数并将现有值计数用作weights

Say you have this df :假设你有这个df

>>> df = pd.DataFrame({'Marks': ['0-10', '10-20', '20-30', '30-40'], 'Number of students': [8, 12, 24, 26]})
   Marks  Number of students
0   0-10                   8
1  10-20                  12
2  20-30                  24
3  30-40                  26

The bins are all the unique boundary values in Marks :这些binsMarks中的所有唯一边界值:

>>> bins = pd.unique(df.Marks.str.split('-', expand=True).astype(int).values.ravel())
array([ 0, 10, 20, 30, 40])

Choose one x value per bin, eg the left edge to make it easy:每个 bin 选择一个x值,例如左边缘以使其容易:

>>> x = bins[:-1]
array([ 0, 10, 20, 30])

Use the existing value counts ( Number of students ) as weights :使用现有值计数( Number of students )作为weights

>>> weights = df['Number of students'].values
array([ 8, 12, 24, 26])

Then plug these into hist() :然后将它们插入hist()

>>> plt.hist(x=x, bins=bins, weights=weights)

重建直方图

One possibility is to “ungroup” data yourself.一种可能性是自己“取消分组”数据。

For example, for the 8 students with a mark between 0 and 10, you can generate 8 data points of value of 5 (the mean).例如,对于分数在 0 到 10 之间的 8 个学生,您可以生成 8 个值为 5(平均值)的数据点。 For the 12 with a mark between 10 and 20, you can generate 12 data points of value 15.对于标记在 10 到 20 之间的 12,您可以生成 12 个值为 15 的数据点。

However, the “ungrouped” data will only be an approximation of the real data.然而,“未分组”的数据只是真实数据的近似值。 Thus, it is probably better to just use a matplotlib.pyplot.bar plot.因此,最好只使用matplotlib.pyplot.bar plot。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM