[英]How to find the start time and end time of an event in python?
I have a data frame consists of column 1 ie event and column 2 is Datetime:我有一个数据框由第 1 列即事件组成,第 2 列是日期时间:
Sample data样本数据
Event Time
0 2020-02-12 11:00:00
0 2020-02-12 11:30:00
2 2020-02-12 12:00:00
1 2020-02-12 12:30:00
0 2020-02-12 13:00:00
0 2020-02-12 13:30:00
0 2020-02-12 14:00:00
1 2020-02-12 14:30:00
0 2020-02-12 15:00:00
0 2020-02-12 15:30:00
And I want to find start time and end time of each event:我想找到每个事件的开始时间和结束时间:
Desired Data所需数据
Event EventStartTime EventEndTime
0 2020-02-12 11:00:00 2020-02-12 12:00:00
2 2020-02-12 12:00:00 2020-02-12 12:30:00
1 2020-02-12 12:30:00 2020-02-12 13:00:00
0 2020-02-12 13:00:00 2020-02-12 14:30:00
1 2020-02-12 14:30:00 2020-02-12 15:00:00
Note: EventEndTime is time when the event changes the value say from value 1 to got change to 0 or any other value or vice versa注意:EventEndTime 是事件将值从值 1 更改为 0 或任何其他值或反之亦然的时间
Here is a method that can get the results without a for loop.这是一种无需for循环即可获得结果的方法。 I assume that the input data is read into a dataframe called df:
我假设输入数据被读入一个名为 df 的数据帧:
# Initialize the output df
dfout = pd.DataFrame()
dfout['Event'] = df['Event']
dfout['EventStartTime'] = df['Time']
Now, I create a variable called 'change' that tells you whether the event changed.现在,我创建了一个名为“change”的变量,它告诉您事件是否发生了变化。
dfout['change'] = df['Event'].diff()
This is how dfout looks now:这就是 dfout 现在的样子:
Event EventStartTime change
0 0 2020-02-12 11:00:00 NaN
1 0 2020-02-12 11:30:00 0.0
2 2 2020-02-12 12:00:00 2.0
3 1 2020-02-12 12:30:00 -1.0
4 0 2020-02-12 13:00:00 -1.0
5 0 2020-02-12 13:30:00 0.0
6 0 2020-02-12 14:00:00 0.0
7 1 2020-02-12 14:30:00 1.0
8 0 2020-02-12 15:00:00 -1.0
9 0 2020-02-12 15:30:00 0.0
Now, I go on to remove the rows where the event did not change:现在,我继续删除事件未更改的行:
dfout = dfout.loc[dfout['change'] !=0 ,:]
This will now leave me with rows where the event has changed.现在,这将给我留下事件已更改的行。
Next, the event end time of the current event is the start time of the next event.接下来,当前事件的事件结束时间为下一个事件的开始时间。
dfout['EventEndTime'] = dfout['EventStartTime'].shift(-1)
The dataframe looks like this:数据框如下所示:
Event EventStartTime change EventEndTime
0 0 2020-02-12 11:00:00 NaN 2020-02-12 12:00:00
2 2 2020-02-12 12:00:00 2.0 2020-02-12 12:30:00
3 1 2020-02-12 12:30:00 -1.0 2020-02-12 13:00:00
4 0 2020-02-12 13:00:00 -1.0 2020-02-12 14:30:00
7 1 2020-02-12 14:30:00 1.0 2020-02-12 15:00:00
8 0 2020-02-12 15:00:00 -1.0 NaN
You may chose to remove the 'change' column and also the last row if not needed.如果不需要,您可以选择删除“更改”列以及最后一行。
Assuming the dataframe is data
:假设数据帧是
data
:
current_event = None
result = []
for event, time in zip(data['Event'], data['Time']):
if event != current_event:
if current_event is not None:
result.append([current_event, start_time, time])
current_event, start_time = event, time
data = pandas.DataFrame(result, columns=['Event','EventStartTime','EventEndTime'])
The trick is to save your event number;诀窍是保存您的活动编号; if the next event number is not the same as the saved one, the saved one has to be ended and a new one started.
如果下一个事件编号与保存的事件编号不同,则必须结束保存的事件并开始新的事件。
Use group by and agg to get the output in desired format.使用 group by 和 agg 以所需格式获取输出。
df =pd.DataFrame([['0',11],['1',12],['1',13],['0',15],['1',16],['3',11]],columns=['Event','Time'] )
df.groupby(['Event']).agg(['first','last']).rename(columns={'first':'start-event','last':'end-event'})
Output:输出:
Event start-event end-event
0 11 15
1 12 16
3 11 11
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.