熊猫剪出一系列具有南价值的东西

Question

I would like to apply the pandas cut function to a series that includes NaNs. 我想将pandas cut功能应用于包含NaN的系列。 The desired behavior is that it buckets the non-NaN elements and returns NaN for the NaN-elements. 所需的行为是，它对非NaN元素进行存储并为NaN元素返回NaN。

import pandas as pd
numbers_with_nan = pd.Series([3,1,2,pd.NaT,3])
numbers_without_nan = numbers_with_nan.dropna()

The cutting works fine for the series without NaNs: 对于没有NaN的系列，切割效果很好：

pd.cut(numbers_without_nan, bins=[1,2,3], include_lowest=True)
0      (2.0, 3.0]
1    (0.999, 2.0]
2    (0.999, 2.0]
4      (2.0, 3.0]

When I cut the series that contains NaNs, element 3 is correctly returned as NaN, but the last element gets the wrong bin assigned: 当我剪切包含NaN的序列时，元素3正确返回为NaN，但是最后一个元素分配了错误的bin：

pd.cut(numbers_with_nan, bins=[1,2,3], include_lowest=True)
0      (2.0, 3.0]
1    (0.999, 2.0]
2    (0.999, 2.0]
3             NaN
4    (0.999, 2.0]

How can I get the following output? 如何获得以下输出？

0      (2.0, 3.0]
1    (0.999, 2.0]
2    (0.999, 2.0]
3             NaN
4      (2.0, 3.0]

Answer 1

This is strange. 这很奇怪。 The problem isn't pd.NaT , it's the fact your series has object dtype instead of a regular numeric series, eg float , int . 问题不pd.NaT ，这是事实，您的序列具有object dtype而不是常规数字序列，例如float ， int 。

A quick fix is to replace pd.NaT with np.nan via fillna . 一个快速的解决办法是更换pd.NaT与np.nan通过fillna 。 This triggers series conversion from object to float64 dtype, and may also lead to better performance. 这将触发从object到float64 dtype的系列转换，也可能导致更好的性能。

s = pd.Series([3, 1, 2, pd.NaT, 3])

res = pd.cut(s.fillna(np.nan), bins=[1, 2, 3], include_lowest=True)

print(res)

0    (2, 3]
1    [1, 2]
2    [1, 2]
3       NaN
4    (2, 3]
dtype: category
Categories (2, object): [[1, 2] < (2, 3]]

A more generalized solution is to convert to numeric explicitly beforehand: 更通用的解决方案是事先将其显式转换为数字：

s = pd.to_numeric(s, errors='coerce')

熊猫剪出一系列具有南价值的东西

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-10-31 10:23:40

熊猫剪出一系列具有南价值的东西

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-10-31 10:23:40

解决方案1
3 已采纳 2018-10-31 10:23:40