简体   繁体   English

在 Pandas 中创建直方图,列具有等距的基数,与范围不成正比

[英]Creating histograms in pandas with columns with equidistant base, not proportional to the range

I am creating an histogram in pandas simply using:我只是使用以下方法在熊猫中创建直方图:

train_data.hist("MY_VARIABLE", bins=[0,5, 10,50,100,500,1000,5000,10000,50000,100000])

(train_data is a pandas df). (train_data 是一个熊猫 df)。

The problem is that, since the range [50000,100000] is so large, I can barely see the small ranges [0,5] or [5,10] etc. I would like the histogram to have equidistant bars on the x-axis, not proportional to the range.问题是,由于范围[50000,100000]太大,我几乎看不到小范围[0,5][5,10]等。我希望直方图在 x- 上有等距条轴,与范围不成比例。 Is this possible?这可能吗?

You can do it this way:你可以这样做:

bins = [0, 5, 10,50,100,500,1000,5000,10000,50000,100000]
df.groupby(pd.cut(df.a, bins=bins, labels=bins[1:])).size().plot.bar(rot=0)

Demo:演示:

df = pd.DataFrame(np.random.randint(0,10**5,(10**4,2)),columns=list('ab'))
bins = [0, 5, 10,50,100,500,1000,5000,10000,50000,100000]
df.groupby(pd.cut(df.a, bins=bins, labels=bins[1:])).size().plot.bar(rot=0)

在此处输入图片说明

filtering results:过滤结果:

threshold = 100
(df.groupby(pd.cut(df.a,
                   bins=bins, 
                   labels=bins[1:]))
   .size()
   .to_frame('count')
   .query('count > @threshold')
)

Out[84]:
        count
a
5000      396
10000     492
50000    4044
100000   4961

plotting filtered:绘图过滤:

(df.groupby(pd.cut(df.a,
                   bins=bins, 
                   labels=bins[1:]))
   .size()
   .to_frame('count')
   .query('count > @threshold')
   .plot.bar(rot=0, width=1.0)
)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM