简体   繁体   English

将“TimeStamp”列截断为pandas`DataFrame`中的小时精度

[英]Truncate `TimeStamp` column to hour precision in pandas `DataFrame`

I have a pandas.DataFrame called df which has an automatically generated index, with a column dt : 我有一个名为dfpandas.DataFrame ,它有一个自动生成的索引,列dt

df['dt'].dtype, df['dt'][0]
# (dtype('<M8[ns]'), Timestamp('2014-10-01 10:02:45'))

What I'd like to do is create a new column truncated to hour precision. 我想要做的是创建一个截断为小时精度的新列。 I'm currently using: 我目前正在使用:

df['dt2'] = df['dt'].apply(lambda L: datetime(L.year, L.month, L.day, L.hour))

This works, so that's fine. 这很有效,所以没关系。 However, I've an inkling there's some nice way using pandas.tseries.offsets or creating a DatetimeIndex or similar. 但是,我有一个很好的方法,使用pandas.tseries.offsets或创建DatetimeIndex或类似的方法。

So if possible, is there some pandas wizardry to do this? 所以,如果可能的话,是否有一些pandas巫术呢?

In pandas 0.18.0 and later, there are datetime floor , ceil and round methods to round timestamps to a given fixed precision/frequency. 在pandas 0.18.0及更高版本中,有datetime floorceilround方法将时间戳舍入到给定的固定精度/频率。 To round down to hour precision, you can use: 要向下舍入到小时精度,您可以使用:

>>> df['dt2'] = df['dt'].dt.floor('h')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

Here's another alternative to truncate the timestamps. 这是截断时间戳的另一种方法。 Unlike floor , it supports truncating to a precision such as year or month. floor不同,它支持截断到精确度,例如年或月。

You can temporarily adjust the precision unit of the underlying NumPy datetime64 datatype, changing it from [ns] to [h] : 您可以临时调整基础NumPy datetime64数据类型的精度单位,将其从[ns]更改为[h]

df['dt'].values.astype('<M8[h]')

This truncates everything to hour precision. 这会将所有内容截断为小时精度。 For example: 例如:

>>> df
                       dt
0     2014-10-01 10:02:45
1     2014-10-01 13:08:17
2     2014-10-01 17:39:24

>>> df['dt2'] = df['dt'].values.astype('<M8[h]')
>>> df
                      dt                     dt2
0    2014-10-01 10:02:45     2014-10-01 10:00:00
1    2014-10-01 13:08:17     2014-10-01 13:00:00
2    2014-10-01 17:39:24     2014-10-01 17:00:00

>>> df.dtypes
dt     datetime64[ns]
dt2    datetime64[ns]

The same method should work for any other unit: months 'M' , minutes 'm' , and so on: 同样的方法适用于任何其他单位:月'M' ,分钟'm' ,等等:

  • Keep up to year: '<M8[Y]' 保持一年: '<M8[Y]'
  • Keep up to month: '<M8[M]' 保持一个月: '<M8[M]'
  • Keep up to day: '<M8[D]' 保持一天: '<M8[D]'
  • Keep up to minute: '<M8[m]' 保持最快: '<M8[m]'
  • Keep up to second: '<M8[s]' 保持第二: '<M8[s]'

我过去用来实现这个目标的方法如下(与你已经在做的非常相似,但我想我还是把它扔出去了):

df['dt2'] = df['dt'].apply(lambda x: x.replace(minute=0, second=0))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM