简体   繁体   中英

pandas.cut function gave me negative values when it is suppose to be 0

I am perplexed as to why my pd.cut function gave me the starting interval that is a negative value. The column that I have cut on, has a minimum value of 0. Hence, I expect my pd.cut function to throw out my first interval to be (0,18) instead of (-0.18,18).

I have changed the precision setting to be 0. However, that just makes my starting interval to be (-0.0,18).

And why is my intervals all in float when the column I've parsed into my pd.cut function is in integers?

这是我工作的照片

Would appreciate all help. Thank you.

As explained in the comments, you asked cut to define the bins automatically for you, by default they are equal width, which mean having a negative bound is possible.

If you wish to keep the automatic binning, you can modify the intervals manually afterwards. Here is an example in case of only the first interval that is "incorrect", using cat.rename_categories :

np.random.seed(0)
s = pd.Series(np.random.randint(-10,100,size=100)).clip(lower=0)
s_cut = pd.cut(s, bins=10)
print(s_cut.cat.categories)

first_I = s_cut.cat.categories[0]
new_I = pd.Interval(0, first_I.right)
s_cut = s_cut.cat.rename_categories({first_I: new_I})
print(s_cut.cat.categories)

output:

# before
IntervalIndex([(-0.095, 9.5], (9.5, 19.0], (19.0, 28.5], ...)

# after
IntervalIndex([(0.0, 9.5], (9.5, 19.0], (19.0, 28.5], ...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM