简体   繁体   English

pd.DataFrame.select_dtypes()包含timedelta dtype

[英]pd.DataFrame.select_dtypes() inculdes timedelta dtype

Why is it expected behavior that this test code: 为什么此测试代码是预期的行为:

test = pd.DataFrame({'bool' :[False, True], 'int':[-1,2], 'float': [-2.5, 3.4],
                     'compl':np.array([1-1j, 5]),
                     'dt'   :[pd.Timestamp('2013-01-02'), pd.Timestamp('2016-10-20')],
                     'td'   :[pd.Timestamp('2012-03-02')- pd.Timestamp('2016-10-20'),
                              pd.Timestamp('2010-07-12')- pd.Timestamp('2000-11-10')]})
test.dtypes
test.select_dtypes(np.number)

Produces DataFrame with TimeDelta column included? 产生包含TimeDelta列的TimeDelta吗?

>>> bool                bool
>>> int                int64
>>> float            float64
>>> compl         complex128
>>> dt        datetime64[ns]
>>> td       timedelta64[ns]
>>> dtype: object

>>>     int     float   compl   td
>>> 0    -1     -2.5    (1-1j)  -1693 days
>>> 1     2      3.4    (5+0j)   3531 days

EDIT: 编辑:

For someone (including me) the following may be helpful: 对于某人(包括我),以下内容可能会有所帮助:

I've also found the reason why this behavior was unexpected for me at first. 我还发现了这种行为起初对我来说是意外的原因。 The reason was another way to check if dtype of pd.DataFrame is numeric. 原因是另一种方式来检查,如果dtypepd.DataFrame是数字。 Namely via pd.api.types.is_numeric_dtype : 即通过pd.api.types.is_numeric_dtype

for col in test.columns:
    if pd.api.types.is_numeric_dtype(test[col]):
        print (test[col].dtype)

>>> bool
>>> int64
>>> float64
>>> complex128

Which produces more 'human-desired' output. 产生更多的“人类期望的”输出。

Because that's how it has been implemented: 因为这就是它的实现方式:

np.issubdtype(np.timedelta64, np.number)
# True

More specifically, 进一步来说,

np.issubdtype(np.timedelta64, np.integer)
# True

timedelta and datetime dtypes in numpy are internally represented by integer. numpy中的timedeltadatetime dtype在内部由整数表示。 This makes it easy to represent in memory, and makes arithmetic on datetimes fast . 这使它易于在内存中表示,并使日期时间的运算速度更快

If you want to exclude these types from your checks, you can specify an exclude argument: 如果要从检查中排除这些类型,则可以指定exclude参数:

test.select_dtypes(include=['number'], exclude=['datetime', 'timedelta'])

   int  float   compl
0   -1   -2.5  (1-1j)
1    2    3.4  (5+0j)

Since numpy.timedelta is belong to numpy.number , if you only want the number numeric columns return 由于numpy.timedelta属于numpy.number ,如果仅希望数字数字列返回

num= ['int16', 'int32', 'int64', 'float16', 'float32', 'float64','complex128']
test.select_dtypes(include=num)
Out[715]: 
    compl  float  int
0  (1-1j)   -2.5   -1
1  (5+0j)    3.4    2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM