简体   繁体   English

绘制具有不均匀分箱的分箱数据

[英]Plotting binned data with uneven bins

I have a data set that I've managed to bin into intervals of 250 and I'm having a very difficult time plotting the values properly.我有一个数据集,我已经设法将其分为 250 的间隔,但我很难正确绘制这些值。 I've had a look at我看过

python plot simple histogram given binned data 给定分箱数据的python plot简单直方图

How to make a histogram from a list of data 如何从数据列表制作直方图

but in my case all I get is a single vertical line.但就我而言,我得到的只是一条垂直线。

for reference my binned data looks like:供参考,我的分箱数据如下所示:

(0, 250]                2
(250, 500]              1
(500, 750]              5
(750, 1000]            13
(1000, 1250]           77
(1250, 1500]          601
(1500, 1750]         1348
(1750, 2000]         3262
(2000, 2250]         3008
(2250, 2500]         5118
(2500, 2750]         4576
(2750, 3000]         5143
(3000, 3250]         3509
(3250, 3500]         4390
(3500, 3750]         2749
(3750, 4000]         2794
(4000, 4250]         1391
(4250, 4500]         1753
(4500, 4750]         1099
(4750, 5000]         1592
(5000, 5250]          688
(5250, 5500]          993
(5500, 5750]          540
(5750, 6000]          937
(6000, 6250]          405
(6250, 6500]          572
(6500, 6750]          202
(6750, 7000]          369
(7000, 7250]          164
(7250, 7500]          231
                     ... 
(7750, 8000]          285
(8000, 8250]           55
(8250, 8500]          116
(8500, 8750]           29
(8750, 9000]          140
(9000, 9250]           31
(9250, 9500]           68
(9500, 9750]           20
(9750, 10000]         132
(10000, 10250]         15
(10250, 10500]         29
(10500, 10750]         21
(10750, 11000]         73
(11000, 11250]         26
(11250, 11500]         36
(11500, 11750]         21
(11750, 12000]         74
(12000, 12250]          5
(12250, 12500]         50
(12500, 12750]         13
(12750, 13000]         34
(13000, 13250]          4
(13250, 13500]         45
(13500, 13750]         14
(13750, 14000]         53
(14000, 14250]          6
(14250, 14500]         17
(14500, 14750]          7
(14750, 15000]         79
(15000, 10000000]     256

where the last interval encompasses everything greater then 15,000.其中最后一个间隔包含大于 15,000 的所有内容。 I've put the above values in a list then attempted to plot:我已将上述值放在一个list然后尝试绘制:

bins = [i for i in range(0, 15001, 250)]
bins.append(10000000)
categories = pd.cut(data["price"], bins)
price_binned = list(pd.value_counts(categories).reindex(categories.cat.categories))
plt.hist(price_binned)

which produces a histogram with 12 bins.它产生一个带有 12 个 bin 的直方图。 adding the bin argument添加bin参数

plt.hist(price_binned, bins=(bin_num+1)) 

produces a histogram where I get a very high vertical line on the left.产生一个直方图,我在左边得到一条非常高的垂直线。 Finally, I was considering adding plt.xticks(bins) , but then I get a graph that produces nothing.最后,我正在考虑添加plt.xticks(bins) ,但后来我得到了一个没有产生任何结果的图表。

Is there anyway that I could produce a histogram where the x-axis are the bin values and the y-axis are the values in the bins?无论如何,我可以生成一个直方图,其中 x 轴是 bin 值,y 轴是 bin 中的值?

使用 <code>plt.bar()</code>

using plt.bar()使用plt.bar()

使用 <code>plt.hist()</code> 不带 bin 参数

using plt.hist() with no bin argument使用没有 bin 参数的plt.hist()

使用 <code>plt.hist()</code> 和 bin=bins

using plt.hist() with bin=bins使用带有 bin=bins 的plt.hist()

使用seaborn

using seaborn使用seaborn

The main problem you have seems to be that you are asking plt.hist() and sns.distplot() to create histograms of your pre-binned histogram data.您遇到的主要问题似乎是您要求plt.hist()sns.distplot()创建预装箱直方图数据的直方图。

You can use a bar chart to facilitate your custom binning scheme with the price_binned variable as follows:您可以使用条形图通过price_binned变量来促进您的自定义分箱方案,如下所示:

fig, ax = plt.subplots(1, 1)
ax.bar(range(len(bins)), price_binned, width=1, align='center')
ax.set_xticklabels([x + 125 for x in bins[:-1]])
plt.show()

Where I have used the midpoint value as the label for each bin.我使用中点值作为每个 bin 的标签。 This can be swapped for any other xtick label notation you prefer.这可以换成您喜欢的任何其他 xtick 标签符号。

Here is the result I get using (most) of your data (some is missing): result .这是我使用(大部分)您的数据(一些丢失)得到的结果result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM