如何计算熊猫连续行的两个不同字段之间的时间差？

Question

For example, if i have following dateframe, 例如，如果我有以下日期范围，

  Task     Started_Time                Time_Duration (min)
   A       23/05/2016  07:00            02:03:38
   B       23/05/2016  09:45            08:03:38
   A       23/05/2016  12:00            00:30:38
   A       23/05/2016  15:30            01:03:38
   A       23/05/2016  21:00            26:03:38
   B       23/05/2016  18:00            30:03:38

How to add date time with time delta to find the "Finished_Time"? 如何添加带有时间增量的日期时间以找到“ Finished_Time”？

And how to group the file by tasks(A,B,...) and find the "freetime" before next task starts? 以及如何根据任务（A，B，...）对文件进行分组并在下一个任务开始之前找到“空闲时间”？

(for example, if the first task A is completed at (7h + 02:03:38) 09:03:38. How to find the "Free_Time" before next task A at 12:00:00 starts. （例如，如果第一个任务A在（7h + 02:03:38）09:03:38完成。如何在12:00:00开始下一个任务A之前查找“ Free_Time”。

Here is how I created this dataframe. 这是我创建此数据框的方式。

Task = ['A','B', 'A','A', 'A' ,'B']
Started Time = ['23/05/2016  07:00:00', '23/05/2016  09:45:00' ,'23/05/2016  12:00:00', '23/05/2016  15:30:00', '23/05/2016  21:00:00', '23/05/2016  18:00:00' ]
Time Duration = ['02:03:38', '08:03:38','00:30:38','01:03:38','26:03:38','30:03:38']

when i try to convert "started time' to datetime, using this: 当我尝试将“开始时间”转换为日期时间时，使用以下方法：

df['Started_Time'] = df['Started_Time'].values.astype('datetime64[D]')

I get the following error: 我收到以下错误：

ValueError: Error parsing datetime string "23/05/2016 07:00" at position 2

How to fix this error and add it with "Time_Duration". 如何解决此错误，并使用“ Time_Duration”添加它。 I convert Time duration to time delta, 我将持续时间转换为时间增量，

df['Time_Duration'] = pd.to_timedelta(df['Time_Duration'],  unit = 'm')
df['Finished_Time'] = df['Started_Time'] + df['Time_Duration']

And, to find the "Free_Time" , I used this code, 并且，为了找到“ Free_Time”，我使用了这段代码，

df.sort_values(['Task']
i=1
for index, row in df.iterrows():
if df.iloc[i,1] == df.iloc[i+1,1]:   
    df['Free_Time'] = df.iloc[i+1,2] + df.iloc[i,3]
    i+1
    print df['Free_Time']

And, I get the following error: 而且，我得到以下错误：

TypeError: unsupported operand type(s) for -: 'str' and 'str'

Answer 1

IIUC you can do it this way: IIUC您可以通过以下方式进行操作：

In [125]: df['Duration'] = df.groupby('Task')['StartedTime'].diff()

In [126]: df
Out[126]:
  Task         StartedTime  Duration
0    A 2016-05-23 07:00:00       NaT
1    B 2016-05-23 09:45:00       NaT
2    A 2016-05-23 12:00:00  05:00:00
3    A 2016-05-23 15:30:00  03:30:00
4    A 2016-05-23 21:00:00  05:30:00
5    B 2016-05-23 18:00:00  08:15:00

In [127]: df.sort_values(['Task', 'StartedTime'])
Out[127]:
  Task         StartedTime  Duration
0    A 2016-05-23 07:00:00       NaT
2    A 2016-05-23 12:00:00  05:00:00
3    A 2016-05-23 15:30:00  03:30:00
4    A 2016-05-23 21:00:00  05:30:00
1    B 2016-05-23 09:45:00       NaT
5    B 2016-05-23 18:00:00  08:15:00

如何计算熊猫连续行的两个不同字段之间的时间差？

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-04-11 09:09:52

如何计算熊猫连续行的两个不同字段之间的时间差？

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-04-11 09:09:52

解决方案1
0 已采纳 2017-04-11 09:09:52