简体   繁体   English

计算 Pandas Dataframe 索引之间的时间差

[英]Calculate time difference between Pandas Dataframe indices

I am trying to add a column of deltaT to a dataframe where deltaT is the time difference between the successive rows (as indexed in the timeseries).我正在尝试将一列 deltaT 添加到数据帧中,其中 deltaT 是连续行之间的时间差(如时间序列中的索引)。

time                 value

2012-03-16 23:50:00      1
2012-03-16 23:56:00      2
2012-03-17 00:08:00      3
2012-03-17 00:10:00      4
2012-03-17 00:12:00      5
2012-03-17 00:20:00      6
2012-03-20 00:43:00      7

Desired result is something like the following (deltaT units shown in minutes):所需的结果类似于以下内容(deltaT 单位以分钟为单位):

time                 value  deltaT

2012-03-16 23:50:00      1       0
2012-03-16 23:56:00      2       6
2012-03-17 00:08:00      3      12
2012-03-17 00:10:00      4       2
2012-03-17 00:12:00      5       2
2012-03-17 00:20:00      6       8
2012-03-20 00:43:00      7      23

Note this is using numpy >= 1.7, for numpy < 1.7, see the conversion here: http://pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas请注意,这是使用 numpy >= 1.7,对于 numpy < 1.7,请参见此处的转换: http : //pandas.pydata.org/pandas-docs/dev/timeseries.html#time-deltas

Your original frame, with a datetime index您的原始框架,带有日期时间索引

In [196]: df
Out[196]: 
                     value
2012-03-16 23:50:00      1
2012-03-16 23:56:00      2
2012-03-17 00:08:00      3
2012-03-17 00:10:00      4
2012-03-17 00:12:00      5
2012-03-17 00:20:00      6
2012-03-20 00:43:00      7

In [199]: df.index
Out[199]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-03-16 23:50:00, ..., 2012-03-20 00:43:00]
Length: 7, Freq: None, Timezone: None

Here is the timedelta64 of what you want这是你想要的 timedelta64

In [200]: df['tvalue'] = df.index

In [201]: df['delta'] = (df['tvalue']-df['tvalue'].shift()).fillna(0)

In [202]: df
Out[202]: 
                     value              tvalue            delta
2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00
2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00
2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00
2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00
2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00
2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00
2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00

Getting out the answer while disregarding the day difference (your last day is 3/20, prior is 3/17), actually is tricky在不考虑天差的情况下找出答案(您的最后一天是 3/20,之前是 3/17),实际上很棘手

In [204]: df['ans'] = df['delta'].apply(lambda x: x  / np.timedelta64(1,'m')).astype('int64') % (24*60)

In [205]: df
Out[205]: 
                     value              tvalue            delta  ans
2012-03-16 23:50:00      1 2012-03-16 23:50:00         00:00:00    0
2012-03-16 23:56:00      2 2012-03-16 23:56:00         00:06:00    6
2012-03-17 00:08:00      3 2012-03-17 00:08:00         00:12:00   12
2012-03-17 00:10:00      4 2012-03-17 00:10:00         00:02:00    2
2012-03-17 00:12:00      5 2012-03-17 00:12:00         00:02:00    2
2012-03-17 00:20:00      6 2012-03-17 00:20:00         00:08:00    8
2012-03-20 00:43:00      7 2012-03-20 00:43:00 3 days, 00:23:00   23

We can create a series with both index and values equal to the index keys using to_series and then compute the differences between successive rows which would result in timedelta64[ns] dtype.我们可以使用to_series创建一个索引和值都等于索引键的to_series ,然后计算连续行之间的差异,这将导致timedelta64[ns] After obtaining this, via the .dt property, we could access the seconds attribute of the time portion and finally divide each element by 60 to get it outputted in minutes(optionally filling the first value with 0).得到这个后,通过.dt属性,我们可以访问时间部分的 seconds 属性,最后将每个元素除以 60 以分钟为单位输出(可选用 0 填充第一个值)。

In [13]: df['deltaT'] = df.index.to_series().diff().dt.seconds.div(60, fill_value=0)
    ...: df                                 # use .astype(int) to obtain integer values
Out[13]: 
                     value  deltaT
time                              
2012-03-16 23:50:00      1     0.0
2012-03-16 23:56:00      2     6.0
2012-03-17 00:08:00      3    12.0
2012-03-17 00:10:00      4     2.0
2012-03-17 00:12:00      5     2.0
2012-03-17 00:20:00      6     8.0
2012-03-20 00:43:00      7    23.0

simplification:简化:

When we perform diff :当我们执行diff

In [8]: ser_diff = df.index.to_series().diff()

In [9]: ser_diff
Out[9]: 
time
2012-03-16 23:50:00               NaT
2012-03-16 23:56:00   0 days 00:06:00
2012-03-17 00:08:00   0 days 00:12:00
2012-03-17 00:10:00   0 days 00:02:00
2012-03-17 00:12:00   0 days 00:02:00
2012-03-17 00:20:00   0 days 00:08:00
2012-03-20 00:43:00   3 days 00:23:00
Name: time, dtype: timedelta64[ns]

Seconds to minutes conversion:秒到分钟的转换:

In [10]: ser_diff.dt.seconds.div(60, fill_value=0)
Out[10]: 
time
2012-03-16 23:50:00     0.0
2012-03-16 23:56:00     6.0
2012-03-17 00:08:00    12.0
2012-03-17 00:10:00     2.0
2012-03-17 00:12:00     2.0
2012-03-17 00:20:00     8.0
2012-03-20 00:43:00    23.0
Name: time, dtype: float64

If suppose you want to include even the date portion as it was excluded previously(only time portion was considered), dt.total_seconds would give you the elapsed duration in seconds with which minutes could then be calculated again by division.如果假设您甚至想包括以前排除的date部分(仅考虑时间部分),则dt.total_seconds将为您提供以秒为单位的经过的持续时间,然后可以通过除法再次计算分钟。

In [12]: ser_diff.dt.total_seconds().div(60, fill_value=0)
Out[12]: 
time
2012-03-16 23:50:00       0.0
2012-03-16 23:56:00       6.0
2012-03-17 00:08:00      12.0
2012-03-17 00:10:00       2.0
2012-03-17 00:12:00       2.0
2012-03-17 00:20:00       8.0
2012-03-20 00:43:00    4343.0    # <-- number of minutes in 3 days 23 minutes
Name: time, dtype: float64

>= Numpy version 1.7.0.

Also can typecast df.index.to_series().diff() from timedelta64[ns] (nano seconds- default dtype) to timedelta64[m] (minutes) [ Frequency conversion (astyping is equivalent of floor division)]也可以df.index.to_series().diff()timedelta64[ns] (nano timedelta64[m]默认timedelta64[m] ) 转换timedelta64[m] (minutes) [ 频率转换(astyping 相当于地板除法)]

df['ΔT'] = df.index.to_series().diff().astype('timedelta64[m]')

                     value      ΔT
time                              
2012-03-16 23:50:00      1     NaN
2012-03-16 23:56:00      2     6.0
2012-03-17 00:08:00      3    12.0
2012-03-17 00:10:00      4     2.0
2012-03-17 00:12:00      5     2.0
2012-03-17 00:20:00      6     8.0
2012-03-20 00:43:00      7  4343.0

( ΔT dtype: float64 ) ΔT dtype: float64

if you want to convert to int , fill na values with 0 before converting如果要转换为int ,请在转换前用0填充na

>>> df.index.to_series().diff().fillna(0).astype('timedelta64[m]').astype('int')

time
2012-03-16 23:50:00       0
2012-03-16 23:56:00       6
2012-03-17 00:08:00      12
2012-03-17 00:10:00       2
2012-03-17 00:12:00       2
2012-03-17 00:20:00       8
2012-03-20 00:43:00    4343
Name: time, dtype: int64

for pandas version >0.24.0., Can also be converted into pandas nullable integer datatype (Int64)对于pandas版本>0.24.0.,也可以转换成pandas可空的整数数据类型(Int64)

>>> df.index.to_series().diff().astype('timedelta64[m]').astype('Int64')

time
2012-03-16 23:50:00    <NA>
2012-03-16 23:56:00       6
2012-03-17 00:08:00      12
2012-03-17 00:10:00       2
2012-03-17 00:12:00       2
2012-03-17 00:20:00       8
2012-03-20 00:43:00    4343
Name: time, dtype: Int64

Timedelta data types support a large number of time units, as well as generic units which can be coerced into any of the other units. Timedelta 数据类型支持大量时间单位,以及可以强制转换为任何其他单位的通用单位。

Below are the date units:以下是日期单位:

Y   year
M   month
W   week
D   day

below are the time units:以下是时间单位:

h   hour
m   minute
s   second
ms  millisecond
us  microsecond
ns  nanosecond
ps  picosecond
fs  femtosecond
as  attosecond

if you want difference upto decimals use true division , ie, divide by np.timedelta64(1, 'm')如果您想要小数点后的差异,请使用true division ,即除以np.timedelta64(1, 'm')
eg if df is as below,例如,如果 df 如下所示,

                     value
time                      
2012-03-16 23:50:21      1
2012-03-16 23:56:28      2
2012-03-17 00:08:08      3
2012-03-17 00:10:56      4
2012-03-17 00:12:12      5
2012-03-17 00:20:00      6
2012-03-20 00:43:43      7

check the difference between asyping( floor division ) and true division below.检查下面的 asyping( floor division ) 和true division之间的区别。

>>> df.index.to_series().diff().astype('timedelta64[m]')
time
2012-03-16 23:50:21       NaN
2012-03-16 23:56:28       6.0
2012-03-17 00:08:08      11.0
2012-03-17 00:10:56       2.0
2012-03-17 00:12:12       1.0
2012-03-17 00:20:00       7.0
2012-03-20 00:43:43    4343.0
Name: time, dtype: float64

>>> df.index.to_series().diff()/np.timedelta64(1, 'm')
time
2012-03-16 23:50:21            NaN
2012-03-16 23:56:28       6.116667
2012-03-17 00:08:08      11.666667
2012-03-17 00:10:56       2.800000
2012-03-17 00:12:12       1.266667
2012-03-17 00:20:00       7.800000
2012-03-20 00:43:43    4343.716667
Name: time, dtype: float64


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM