简体   繁体   English

如何使用 python 计算 dataframe 中特定行值之间的时间差?

[英]How to calculate time difference between specific row values in dataframe using python?

The df looks like below: df 如下所示:


Time                    A 

2019-05-18 01:15:28     7
2019-05-18 01:28:11     7
2019-05-18 01:36:36     12
2019-05-18 01:39:47     12
2019-05-18 01:53:32     12
2019-05-18 02:05:37     7

I understand how to calculate consecutive row time difference.我了解如何计算连续行时间差。 But I want to calculate the time difference whenever the Value in A is 7 to 12.但我想在 A 中的值为 7 到 12 时计算时间差。

Expected output:预期 output:


Time                    A   Time_difference

2019-05-18 01:15:28     7   0
2019-05-18 01:28:11     7   0
2019-05-18 01:36:36     12  00:21:08
2019-05-18 01:39:47     12  0
2019-05-18 01:53:32     12  0
2019-05-18 02:05:37     12  0

You can isolate any values in dataframes using loc .您可以使用loc隔离数据框中的任何值。 What gets returned is a Series, which can be indexed like a list.返回的是一个系列,它可以像列表一样被索引。 Use [0] to get the first occurrence in the Series.使用[0]获取系列中的第一个匹配项。

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [9, 7, 7, 5, 12, 12]

df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
pd.Timedelta(df.loc[df.a == 12, 'times'].values[0] - df.loc[df.a == 7, 'times'].values[0])

Timedelta('0 days 00:25:21')

Or we can break that code apart for readability's sake and do the calculations on new variables:或者,为了便于阅读,我们可以将代码分开,并对新变量进行计算:

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [9, 7, 7, 5, 12, 12]

df = pd.DataFrame({'times':times, 'a':a})
df.times = pd.to_datetime(df['times'])
end = df.loc[df.a == 12, 'times'].values[0]
start = df.loc[df.a == 7, 'times'].values[0]
pd.Timedelta(end - start)

Timedelta('0 days 00:25:21')

Sample:样本:

times = [
    '2019-05-18 01:15:28',
    '2019-05-18 01:28:11',
    '2019-05-18 01:36:36',
    '2019-05-18 01:39:47',
    '2019-05-18 01:53:32',
    '2019-05-18 02:05:37'
]

a = [7, 7, 12, 7, 12, 7]

df = pd.DataFrame({'times': pd.to_datetime(times), 'A':a})
print (df)
                times   A
0 2019-05-18 01:15:28   7
1 2019-05-18 01:28:11   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

First create default index and filter rows with 7 and 12 only:首先创建默认索引并仅使用712过滤行:

df = df.reset_index(drop=True)
df1 = df[df['A'].isin([7, 12])]

Then get first consecutive values in rows with compare with shifted values:然后通过与移位值进行比较来获取行中的第一个连续值:

df1 = df1[df1['A'].ne(df1['A'].shift())]
print (df1)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12
5 2019-05-18 02:05:37   7

Then filter 7 with next 12 rows:然后用接下来的12行过滤7

m1 = df1['A'].eq(7) & df1['A'].shift(-1).eq(12)
m2 = df1['A'].eq(12) & df1['A'].shift().eq(7)

df2 = df1[m1 | m2]
print (df2)
                times   A
0 2019-05-18 01:15:28   7
2 2019-05-18 01:36:36  12
3 2019-05-18 01:39:47   7
4 2019-05-18 01:53:32  12

Get datetimes with pair and unpairs rows:使用对和取消对行获取日期时间:

out7 = df2.iloc[::2]
out12 = df2.iloc[1::2]

And last subtract:最后减去:

df['Time_difference'] = out12['times'] - out7['times'].to_numpy()
df['Time_difference'] = df['Time_difference'].fillna(pd.Timedelta(0))
print (df)
                times   A Time_difference
0 2019-05-18 01:15:28   7        00:00:00
1 2019-05-18 01:28:11   7        00:00:00
2 2019-05-18 01:36:36  12        00:21:08
3 2019-05-18 01:39:47   7        00:00:00
4 2019-05-18 01:53:32  12        00:13:45
5 2019-05-18 02:05:37   7        00:00:00

Explanation :说明

  • (df["A"] == 7).cumsum() separates rows to each 7 (df["A"] == 7).cumsum() 将行分隔为每行 7
  • for each group of 7, if there is 12 the substract the 1st row with 12 from 1st row of group对于每组 7 个,如果有 12 个,则从组的第 1 行减去第 1 行和 12
  • If not pass value of 1st row of group to next group until 12 is found如果在找到 12 之前不将第一行组的值传递给下一组

import pandas as pd
import numpy as np

np.random.seed(10)
date_range = pd.date_range("25-9-2019", "27-9-2019", freq="3H")
df = pd.DataFrame({'Time':date_range, 'A':np.random.choice([5,7,12], len(date_range))})

df["Seven"] = (df["A"] == 7).cumsum()

# display(df)
pass_to_next_group = {"val": None}
def diff(group):
    group["Diff"]=0
    loc = group.index[group["A"]==12]

    time_a = pass_to_next_group["val"] if pass_to_next_group["val"] else group["Time"].iloc[0]
    pass_to_next_group["val"] = None

    if group.name>0 and len(loc)>0:           
        group.loc[loc[0],"Diff"] =  time_a-group.loc[loc[0],"Time"]
    else:
        pass_to_next_group["val"] = time_a

    return group


df.groupby("Seven").apply(diff)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas DataFrame 计算特定时间范围内两列之间的时间差 - Pandas DataFrame Calculate time difference between 2 columns on specific time range Pandas:计算数据帧中所有行和特定行之间的差异 - Pandas: Calculate the difference between all rows and a specific row in the dataframe 计算 Python 中 dataframe 中最后一行与所有其他行之间的时间差 - Calculate time difference between last row and all other rows in a dataframe in Python 如何使用python计算数据框值之间的范围 - how to calculate range between the dataframe values using python “如何在Python中计算连续时间值的差异” - “How to calculate difference in succesive time values in Python” 计算 DataFrame 中两个给定日期值之间的特定时间间隔 - Calculate the specific time intervals between two given date values in a DataFrame 如何计算行与另一个特定行之间的差异? - How to calculate the difference between rows compared to another specific row? 计算 Pandas Dataframe 索引之间的时间差 - Calculate time difference between Pandas Dataframe indices 根据其他列中的行值计算数据框中行值之间的差异 - Calculate difference between row values in dataframe based on row value in other column 如何计算Python DataFrame中非连续行之间的差异? - How to calculate difference between non-consecutive rows in Python DataFrame?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM