[英]pd.DataFrame.select_dtypes() inculdes timedelta dtype
Why is it expected behavior that this test code: 为什么此测试代码是预期的行为:
test = pd.DataFrame({'bool' :[False, True], 'int':[-1,2], 'float': [-2.5, 3.4],
'compl':np.array([1-1j, 5]),
'dt' :[pd.Timestamp('2013-01-02'), pd.Timestamp('2016-10-20')],
'td' :[pd.Timestamp('2012-03-02')- pd.Timestamp('2016-10-20'),
pd.Timestamp('2010-07-12')- pd.Timestamp('2000-11-10')]})
test.dtypes
test.select_dtypes(np.number)
Produces DataFrame with TimeDelta
column included? 产生包含
TimeDelta
列的TimeDelta
吗?
>>> bool bool
>>> int int64
>>> float float64
>>> compl complex128
>>> dt datetime64[ns]
>>> td timedelta64[ns]
>>> dtype: object
>>> int float compl td
>>> 0 -1 -2.5 (1-1j) -1693 days
>>> 1 2 3.4 (5+0j) 3531 days
For someone (including me) the following may be helpful: 对于某人(包括我),以下内容可能会有所帮助:
I've also found the reason why this behavior was unexpected for me at first. 我还发现了这种行为起初对我来说是意外的原因。 The reason was another way to check if
dtype
of pd.DataFrame
is numeric. 原因是另一种方式来检查,如果
dtype
的pd.DataFrame
是数字。 Namely via pd.api.types.is_numeric_dtype
: 即通过
pd.api.types.is_numeric_dtype
:
for col in test.columns:
if pd.api.types.is_numeric_dtype(test[col]):
print (test[col].dtype)
>>> bool
>>> int64
>>> float64
>>> complex128
Which produces more 'human-desired' output. 产生更多的“人类期望的”输出。
Because that's how it has been implemented: 因为这就是它的实现方式:
np.issubdtype(np.timedelta64, np.number)
# True
More specifically, 进一步来说,
np.issubdtype(np.timedelta64, np.integer)
# True
timedelta
and datetime
dtypes in numpy are internally represented by integer. numpy中的
timedelta
和datetime
dtype在内部由整数表示。 This makes it easy to represent in memory, and makes arithmetic on datetimes fast . 这使它易于在内存中表示,并使日期时间的运算速度更快 。
If you want to exclude these types from your checks, you can specify an exclude
argument: 如果要从检查中排除这些类型,则可以指定
exclude
参数:
test.select_dtypes(include=['number'], exclude=['datetime', 'timedelta'])
int float compl
0 -1 -2.5 (1-1j)
1 2 3.4 (5+0j)
Since numpy.timedelta
is belong to numpy.number
, if you only want the number numeric columns return 由于
numpy.timedelta
属于numpy.number
,如果仅希望数字数字列返回
num= ['int16', 'int32', 'int64', 'float16', 'float32', 'float64','complex128']
test.select_dtypes(include=num)
Out[715]:
compl float int
0 (1-1j) -2.5 -1
1 (5+0j) 3.4 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.