[英]Truncate `TimeStamp` column to hour precision in pandas `DataFrame`
I have a pandas.DataFrame
called df
which has an automatically generated index, with a column dt
: 我有一个名为
df
的pandas.DataFrame
,它有一个自动生成的索引,列dt
:
df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))
What I'd like to do is create a new column truncated to hour precision. 我想要做的是创建一个截断为小时精度的新列。 I'm currently using:
我目前正在使用:
df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))
This works, so that's fine. 这很有效,所以没关系。 However, I've an inkling there's some nice way using
pandas.tseries.offsets
or creating a DatetimeIndex
or similar. 但是,我有一个很好的方法,使用
pandas.tseries.offsets
或创建DatetimeIndex
或类似的方法。
So if possible, is there some pandas
wizardry to do this? 所以,如果可能的话,是否有一些
pandas
巫术呢?
In pandas 0.18.0 and later, there are datetime floor
, ceil
and round
methods to round timestamps to a given fixed precision/frequency. 在pandas 0.18.0及更高版本中,有datetime
floor
, ceil
和round
方法将时间戳舍入到给定的固定精度/频率。 To round down to hour precision, you can use: 要向下舍入到小时精度,您可以使用:
>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
Here's another alternative to truncate the timestamps. 这是截断时间戳的另一种方法。 Unlike
floor
, it supports truncating to a precision such as year or month. 与
floor
不同,它支持截断到精确度,例如年或月。
You can temporarily adjust the precision unit of the underlying NumPy datetime64
datatype, changing it from [ns]
to [h]
: 您可以临时调整基础NumPy
datetime64
数据类型的精度单位,将其从[ns]
更改为[h]
:
df['dt'].values.astype('<M8[h]')
This truncates everything to hour precision. 这会将所有内容截断为小时精度。 For example:
例如:
>>> df
dt
0 2014-10-01 10:02:45
1 2014-10-01 13:08:17
2 2014-10-01 17:39:24
>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
dt dt2
0 2014-10-01 10:02:45 2014-10-01 10:00:00
1 2014-10-01 13:08:17 2014-10-01 13:00:00
2 2014-10-01 17:39:24 2014-10-01 17:00:00
>>> df.dtypes
dt datetime64[ns]
dt2 datetime64[ns]
The same method should work for any other unit: months 'M'
, minutes 'm'
, and so on: 同样的方法适用于任何其他单位:月
'M'
,分钟'm'
,等等:
'<M8[Y]'
'<M8[Y]'
'<M8[M]'
'<M8[M]'
'<M8[D]'
'<M8[D]'
'<M8[m]'
'<M8[m]'
'<M8[s]'
'<M8[s]'
我过去用来实现这个目标的方法如下(与你已经在做的非常相似,但我想我还是把它扔出去了):
df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.