![](/img/trans.png)
[英]Calculate the time difference between two hh:mm columns in a pandas dataframe
[英]Pandas Dataframe calculate Time difference for each group and Time difference between two different groups
我创建了一个这样的数据框:
import pandas as pd
d = {'Time': ['01.07.2019, 06:21:33', '01.07.2019, 06:32:01', '01.07.2019, 06:57:33', '01.07.2019, 07:24:33','01.07.2019, 08:26:25', '01.07.2019, 09:12:44']
,'Action': ['Opened', 'Closed', 'Opened', 'Closed', 'Opened', 'Closed']
,'Name': ['Bayer', 'Bayer', 'ITM', 'ITM', 'Geco' , 'Geco'],
'Group': ['1', '1', '2','2','3','3']}
df = pd.DataFrame(data=d)
output:
Time Action Name Group
0 01.07.2019, 06:21:33 Opened Bayer 1
1 01.07.2019, 06:32:01 Closed Bayer 1
2 01.07.2019, 06:57:33 Opened ITM 2
3 01.07.2019, 07:24:33 Closed ITM 2
4 01.07.2019, 08:26:25 Opened Geco 3
5 01.07.2019, 09:12:44 Closed Geco 3
所以现在我正在尝试计算每个组的时差以及这些组之间的时差(以分钟为单位)。 例如,拜耳组的时间差应为 10 分 28 秒,拜耳与 ITM 的时间差应为 25 分 32 秒。 之后,同一组之间的时间差应显示在组开始的同一行的列中,两个不同组之间的时间差应显示在组结束的同一行的另一列中。
所以希望的输出是:
Time Action Name Group Time Difference(names) Time Difference(groups)
0 01.07.2019, 06:21:33 Opened Bayer 1 10:28
1 01.07.2019, 06:32:01 Closed Bayer 1 25:32
2 01.07.2019, 06:57:33 Opened ITM 2 27:00
3 01.07.2019, 07:24:33 Closed ITM 2 1:01:52
4 01.07.2019, 08:26:25 Opened Geco 3 46:19
5 01.07.2019, 09:12:44 Closed Geco 3
我怎么能那样做?
首先从字符串制作日期时间,然后是一些分组和差异:
df["Time"] = pd.to_datetime(df["Time"])
df["d1"] = df.groupby("Name")["Time"].diff().shift(-1).fillna("")
df["d2"] = (
df.groupby((df["Action"] == "Closed").cumsum())["Time"]
.diff()
.shift(-1)
.fillna("")
)
产生
| | Time | Action | Name | Group | d1 | d2 |
|---:|:--------------------|:---------|:-------|--------:|:----------------|:----------------|
| 0 | 2019-01-07 06:21:33 | Opened | Bayer | 1 | 0 days 00:10:28 | |
| 1 | 2019-01-07 06:32:01 | Closed | Bayer | 1 | | 0 days 00:25:32 |
| 2 | 2019-01-07 06:57:33 | Opened | ITM | 2 | 0 days 00:46:19 | |
| 3 | 2019-01-07 07:24:33 | Closed | ITM | 2 | | 0 days 01:01:52 |
| 4 | 2019-01-07 08:26:25 | Opened | Geco | 3 | 0 days 00:27:00 | |
| 5 | 2019-01-07 09:12:44 | Closed | Geco | 3 | | |
为了稍微解释一下d2
计算,这个(df['Action'] == 'Closed').cumsum()
对于每个新的'Closed'
行增加 1。 在这里,为了清晰起见,我将它与Action
一起打印,使用这个
df['d2_cond'] = (df['Action'] == 'Closed').cumsum()
df[['Action', 'd2_cond']]
印刷
Action d2_cond
0 Opened 0
1 Closed 1
2 Opened 1
3 Closed 2
4 Opened 2
5 Closed 3
所以我们可以在这个列表上进行groupby
,将每个Closed
与相应的下一个Opened
放在一起
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.