简体   繁体   English

计算每个组的两个不同行之间的时间差(以小时为单位),在一列中具有“滞后”

[英]Calculating time difference in hours between two different rows per gorup with a 'lag' in one column

I have a dataset similar to this.我有一个类似的数据集。

df = pd.DataFrame({'CODE':["PNR1", "PNR1", "PNR1", "PNR1","PNR2","PNR2","PNR3" ],
        'FROM': ["RDM", "DEN", "FLL", "IAH", "AEP", "COR", "DEL"],
            'TO' : [ "DEN", "FLL", "IAH","SFO", "COR", "AEP", "BOM"],
        'DEPART':["13-07-2021 13:00:00","13-07-2021 17:50:00","15-07-2021 17:15:00","15-07-2021 19:40:00", "22-09-2021 09:05:00", "22-09-2021 17:05:00", "22-09-2021 06:05:00"],
        'ARRIVAL':["13-07-2021 16:26:00","13-07-2021 23:39:00","15-07-2021 18:52:00", "15-07-2021 21:27:00","22-09-2021 10:30:00","22-09-2021 18:20:00","22-09-2021 08:05:00"]})
df['NUMSTOPS'] = df.groupby('CODE')['CODE'].transform('count')
df['DEPART'] = pd.to_datetime(df['DEPART'], format='%d-%m-%Y %H:%M:%S')
df['ARRIVAL'] = pd.to_datetime(df['ARRIVAL'], format='%d-%m-%Y %H:%M:%S')

I want to find the time difference for each group to find out the time difference of 'DEPART' from the previous arrival and show it in hours.我想找到每个组的时差,以找出“DEPART”与上次到达的时差,并以小时为单位显示。 I have applied the following:我已经应用了以下内容:

df['LAYOVER']=df.groupby('CODE').apply(lambda x:x['DEPART']- x['ARRIVAL']).shift(1).fillna('0').reset_index(drop=True)

But that gives me the following output:但这给了我以下 output:

    CODE       FROM TO   DEPART           ARRIVAL               NUMSTOPS        LAYOVER
0   PNR1    RDM DEN 2021-07-13 13:00:00 2021-07-13 16:26:00 4   00:00:00
1   PNR1    DEN FLL 2021-07-13 17:50:00 2021-07-13 23:39:00 4   -1 days +20:34:00
2   PNR1    FLL IAH 2021-07-15 17:15:00 2021-07-15 18:52:00 4   -1 days +18:11:00
3   PNR1    IAH SFO 2021-07-15 19:40:00 2021-07-15 21:27:00 4   -1 days +22:23:00
4   PNR2    AEP COR 2021-09-22 09:05:00 2021-09-22 10:30:00 2   -1 days +22:13:00
5   PNR2    COR AEP 2021-09-22 17:05:00 2021-09-22 18:20:00 2   -1 days +22:35:00
6   PNR3    DEL BOM 2021-09-22 06:05:00 2021-09-22 08:05:00 1   -1 days +22:45:00

I don't understand what am I doing wrong and how to proceed with calculating the departure time and the previous arrival for each group.我不明白我做错了什么以及如何继续计算每个小组的出发时间和之前的到达时间。

Expected output:预期 output:

预期产出

You have a parentheses out of place.你有一个括号不合适。 .shift() was outside your parenthesis. .shift() 在你的括号之外。 fixed here.固定在这里。

df['LAYOVER']=df.groupby('CODE').apply(lambda x:x['DEPART']- x['ARRIVAL'].shift(1)).fillna('0').reset_index(drop=True)
df['LAYOVER'].apply(lambda x: pd.Timedelta(x).seconds)

0        0
1     5040
2    63360
3     2880
4        0
5    23700
6        0
Name: LAYOVER, dtype: int64

this is more concise:这更简洁:

df.groupby('CODE').apply(lambda x:x['DEPART']- x['ARRIVAL'].shift(1)).fillna('0').reset_index(drop=True).dt.seconds

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM