简体   繁体   English

如何从timedelta对象的DataFrame / Series列中提取小时?

[英]How to extract hours from DataFrame/Series column of timedelta objects?

My series s looks something that looks like: 我的系列s看起来,看起来像:

0   0 days 09:14:29.142000
1   0 days 00:01:08.060000
2   1 days 00:08:40.192000
3   0 days 17:52:18.782000
4   0 days 01:56:44.696000
dtype: timedelta64[ns]

I'm having trouble understanding how to pull out the hours (rounded to the nearest hour) 我在理解如何提取小时数方面遇到困难(四舍五入到最近的小时数)

Edit: 编辑:

I realize I can do something like s[0].hours , which gives me 9L . 我意识到我可以做类似s[0].hours事情,这给了我9L So I can do s[0].hours + 24*s[0].days and then round accordingly using the minutes. 因此,我可以执行s[0].hours + 24*s[0].days ,然后使用分钟进行四舍五入。

How I can do this on the entire series at once? 我如何一次在整个系列中做到这一点?

This is right out of the docs here . 这就是这里的文档。 And this is vectorized. 这是矢量化的。

In [16]: s
Out[16]: 
0   0 days 09:14:29.142000
1   0 days 00:01:08.060000
2   1 days 00:08:40.192000
3   0 days 17:52:18.782000
4   0 days 01:56:44.696000
Name: 0, dtype: timedelta64[ns]

In [17]: s.dt.components      
Out[17]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     0      9       14       29           142             0            0
1     0      0        1        8            60             0            0
2     1      0        8       40           192             0            0
3     0     17       52       18           782             0            0
4     0      1       56       44           696             0            0

In [18]: s.dt.components.hours
Out[18]: 
0     9
1     0
2     0
3    17
4     1
Name: hours, dtype: int64

Here's another way to approach this if you don't need the actual hours attribute, but the Timedelta in terms of another unit (this is called frequency conversion) 如果您不需要实际的小时数属性,那么这是另一种解决方法,但是Timedelta以另一个单位表示(这称为频率转换)

In [31]: s/pd.Timedelta('1h')
Out[31]: 
0     9.241428
1     0.018906
2    24.144498
3    17.871884
4     1.945749
dtype: float64

In [32]: np.ceil(s/pd.Timedelta('1h'))
Out[32]: 
0    10
1     1
2    25
3    18
4     2
dtype: float64

Let's assume your time delta column there is called "Delta". 让我们假设您的时间增量列称为“增量”。 Then you can do it this way: 然后,您可以通过以下方式进行操作:

df['rh'] = df.Delta.apply(lambda x: round(pd.Timedelta(x).total_seconds() \
                          % 86400.0 / 3600.0) )

Each time delta is really a numpy.timedelta64 under the covers. 每个时间增量实际上都是一个numpy.timedelta64 It helps to cast it to a pandas Timedelta which has more convenient methods. 它有助于将其转换为具有更便捷方法的熊猫Timedelta Here I just ask for the number of total seconds, lop off any multiples of 86400 (ie numbers that indicate full days), and divide by 3600 (number of seconds in an hour). 在这里,我只要求总秒数,减去86400的任何倍数(即表示整天的数字),然后除以3600(一小时的秒数)。 That gives you a floating point number of hours, which you then round. 这为您提供了一个浮点小时数,然后您可以对其进行舍入。

更新后的数据框

I assumed, btw, that you wanted just the hour, minutes, seconds, and partial seconds components considered in the rounded hours, but not the full days. 顺便说一句,我假设您只需要在四舍五入的小时中考虑小时,分钟,秒和部分秒的组成部分,而不是整天。 If you want all the hours, including the days, just omit the modulo operation that lops off days: 如果您想要包括小时在内的所有小时数,只需省略掉几天的模运算:

df['rh2'] = df.Delta.apply(lambda x: round(pd.Timedelta(x).total_seconds() \
                           / 3600.0) )

Then you get: 然后您得到:

备用更新

It's also possible to do these calculations directly in numpy terms: 也可以直接用numpy术语进行这些计算:

df['rh'] = df.Delta.apply(lambda x: round(x / np.timedelta64(1, 'h')) % 24 )
df['rh2'] = df.Delta.apply(lambda x: round(x / np.timedelta64(1, 'h')) )

Where np.timedelta64(1, 'h') provides the number of nanoseconds in 1 hour, and the optional % 24 lops off whole day components (if desired). 其中np.timedelta64(1, 'h')提供1小时内的纳秒数,并且可选的% 24舍弃全天分量(如果需要)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM