I would like to apply the pandas cut function to a series that includes NaNs. The desired behavior is that it buckets the non-NaN elements and returns NaN for the NaN-elements.
import pandas as pd
numbers_with_nan = pd.Series([3,1,2,pd.NaT,3])
numbers_without_nan = numbers_with_nan.dropna()
The cutting works fine for the series without NaNs:
pd.cut(numbers_without_nan, bins=[1,2,3], include_lowest=True)
0 (2.0, 3.0]
1 (0.999, 2.0]
2 (0.999, 2.0]
4 (2.0, 3.0]
When I cut the series that contains NaNs, element 3 is correctly returned as NaN, but the last element gets the wrong bin assigned:
pd.cut(numbers_with_nan, bins=[1,2,3], include_lowest=True)
0 (2.0, 3.0]
1 (0.999, 2.0]
2 (0.999, 2.0]
3 NaN
4 (0.999, 2.0]
How can I get the following output?
0 (2.0, 3.0]
1 (0.999, 2.0]
2 (0.999, 2.0]
3 NaN
4 (2.0, 3.0]
This is strange. The problem isn't pd.NaT
, it's the fact your series has object
dtype instead of a regular numeric series, eg float
, int
.
A quick fix is to replace pd.NaT
with np.nan
via fillna
. This triggers series conversion from object
to float64
dtype, and may also lead to better performance.
s = pd.Series([3, 1, 2, pd.NaT, 3])
res = pd.cut(s.fillna(np.nan), bins=[1, 2, 3], include_lowest=True)
print(res)
0 (2, 3]
1 [1, 2]
2 [1, 2]
3 NaN
4 (2, 3]
dtype: category
Categories (2, object): [[1, 2] < (2, 3]]
A more generalized solution is to convert to numeric explicitly beforehand:
s = pd.to_numeric(s, errors='coerce')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.