简体   繁体   中英

How to extract hours from DataFrame/Series column of timedelta objects?

My series s looks something that looks like:

0   0 days 09:14:29.142000
1   0 days 00:01:08.060000
2   1 days 00:08:40.192000
3   0 days 17:52:18.782000
4   0 days 01:56:44.696000
dtype: timedelta64[ns]

I'm having trouble understanding how to pull out the hours (rounded to the nearest hour)

Edit:

I realize I can do something like s[0].hours , which gives me 9L . So I can do s[0].hours + 24*s[0].days and then round accordingly using the minutes.

How I can do this on the entire series at once?

This is right out of the docs here . And this is vectorized.

In [16]: s
Out[16]: 
0   0 days 09:14:29.142000
1   0 days 00:01:08.060000
2   1 days 00:08:40.192000
3   0 days 17:52:18.782000
4   0 days 01:56:44.696000
Name: 0, dtype: timedelta64[ns]

In [17]: s.dt.components      
Out[17]: 
   days  hours  minutes  seconds  milliseconds  microseconds  nanoseconds
0     0      9       14       29           142             0            0
1     0      0        1        8            60             0            0
2     1      0        8       40           192             0            0
3     0     17       52       18           782             0            0
4     0      1       56       44           696             0            0

In [18]: s.dt.components.hours
Out[18]: 
0     9
1     0
2     0
3    17
4     1
Name: hours, dtype: int64

Here's another way to approach this if you don't need the actual hours attribute, but the Timedelta in terms of another unit (this is called frequency conversion)

In [31]: s/pd.Timedelta('1h')
Out[31]: 
0     9.241428
1     0.018906
2    24.144498
3    17.871884
4     1.945749
dtype: float64

In [32]: np.ceil(s/pd.Timedelta('1h'))
Out[32]: 
0    10
1     1
2    25
3    18
4     2
dtype: float64

Let's assume your time delta column there is called "Delta". Then you can do it this way:

df['rh'] = df.Delta.apply(lambda x: round(pd.Timedelta(x).total_seconds() \
                          % 86400.0 / 3600.0) )

Each time delta is really a numpy.timedelta64 under the covers. It helps to cast it to a pandas Timedelta which has more convenient methods. Here I just ask for the number of total seconds, lop off any multiples of 86400 (ie numbers that indicate full days), and divide by 3600 (number of seconds in an hour). That gives you a floating point number of hours, which you then round.

更新后的数据框

I assumed, btw, that you wanted just the hour, minutes, seconds, and partial seconds components considered in the rounded hours, but not the full days. If you want all the hours, including the days, just omit the modulo operation that lops off days:

df['rh2'] = df.Delta.apply(lambda x: round(pd.Timedelta(x).total_seconds() \
                           / 3600.0) )

Then you get:

备用更新

It's also possible to do these calculations directly in numpy terms:

df['rh'] = df.Delta.apply(lambda x: round(x / np.timedelta64(1, 'h')) % 24 )
df['rh2'] = df.Delta.apply(lambda x: round(x / np.timedelta64(1, 'h')) )

Where np.timedelta64(1, 'h') provides the number of nanoseconds in 1 hour, and the optional % 24 lops off whole day components (if desired).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM