将“TimeStamp”列截断为pandas`DataFrame`中的小时精度

Question

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt : 我有一个名为df的pandas.DataFrame ，它有一个自动生成的索引，列dt ：

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I'd like to do is create a new column truncated to hour precision. 我想要做的是创建一个截断为小时精度的新列。 I'm currently using: 我目前正在使用：

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

This works, so that's fine. 这很有效，所以没关系。 However, I've an inkling there's some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar. 但是，我有一个很好的方法，使用pandas.tseries.offsets或创建DatetimeIndex或类似的方法。

So if possible, is there some pandas wizardry to do this? 所以，如果可能的话，是否有一些pandas巫术呢？

Answer 1

In pandas 0.18.0 and later, there are datetime floor , ceil and round methods to round timestamps to a given fixed precision/frequency. 在pandas 0.18.0及更高版本中，有datetime floor ， ceil和round方法将时间戳舍入到给定的固定精度/频率。 To round down to hour precision, you can use: 要向下舍入到小时精度，您可以使用：

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

Here's another alternative to truncate the timestamps. 这是截断时间戳的另一种方法。 Unlike floor , it supports truncating to a precision such as year or month. 与floor不同，它支持截断到精确度，例如年或月。

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h] : 您可以临时调整基础NumPy datetime64数据类型的精度单位，将其从[ns]更改为[h] ：

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. 这会将所有内容截断为小时精度。 For example: 例如：

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M' , minutes 'm' , and so on: 同样的方法适用于任何其他单位：月'M' ，分钟'm' ，等等：

Keep up to year: '<M8[Y]' 保持一年： '<M8[Y]'
Keep up to month: '<M8[M]' 保持一个月： '<M8[M]'
Keep up to day: '<M8[D]' 保持一天： '<M8[D]'
Keep up to minute: '<M8[m]' 保持最快： '<M8[m]'
Keep up to second: '<M8[s]' 保持第二： '<M8[s]'

Answer 2

我过去用来实现这个目标的方法如下（与你已经在做的非常相似，但我想我还是把它扔出去了）：

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))

将“TimeStamp”列截断为pandas`DataFrame`中的小时精度

问题描述

2 个解决方案

解决方案1
57 已采纳 2015-02-28 16:24:54

解决方案2
2 2015-02-28 18:42:14

将“TimeStamp”列截断为pandas`DataFrame`中的小时精度

问题描述

2 个解决方案

解决方案1 57 已采纳 2015-02-28 16:24:54

解决方案2 2 2015-02-28 18:42:14

解决方案1
57 已采纳 2015-02-28 16:24:54

解决方案2
2 2015-02-28 18:42:14