I would like to loop through time series data-frame td1
below to find the start and end points of some activities.
Definition of activities: Type
column consists of medium
and low
values(you can see medium
and low
as two independent time series). For the same X
, if a
turns 1
for either values of Type
(eg., for X ==18
, a
becomes 1
while Type==medium
or a
becomes 1
while Type==low
), it marks the start of the activity, I would like to take down the Id
and Timestamp
at this timestamp as Start_Id
and StartTime
respectively;
Once the activity starts, it's in an ongoing status. While the activity is ongoing, if a
turns 0
for both Type
values (ie., medium
and low
), it marks the end of the activity(eg., for X ==18
, a
becomes 0
while Type==medium
and a
becomes 0
while Type==low
, following the time series). I would like to take down the Id
and Timestamp
at this time as End_Id
and EndTime
respectively.
Finally, collect all b
values during each activity if:
Type == medium
; anda ==1
into a list called list_container
. td1:
Timestamp X Y a b Type Id
0 2000-10-26 10:08:27.060 18 14 0.0 24.5 medium 18
1 2000-10-26 10:39:24.310 18 13 1.0 24.0 low 18 Start
2 2000-10-26 11:50:48.190 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1
3 2000-10-26 17:18:07.610 18 14 1.0 23.5 medium 18 ---- collect `b` value in `list_container` 1
4 2000-10-26 17:18:09.610 18 14 0.0 23.5 medium 18
5 2000-10-26 17:29:10.610 18 14 0.0 26.5 medium 18
6 2000-10-26 17:29:10.770 18 14 1.0 26.5 medium 18 ---- collect `b` value in `list_container` 1
7 2000-10-26 17:29:12.610 18 14 1.0 53.5 medium 18 ---- collect `b` value in `list_container` 1
8 2000-10-26 17:29:14.610 18 14 1.0 62.0 medium 18 ---- collect `b` value in `list_container` 1
9 2000-10-26 17:29:14.770 18 13 1.0 24.0 low 18
10 2000-10-26 17:29:16.610 18 14 1.0 64.5 medium 18 ---- collect `b` value in `list_container` 1
11 2000-10-26 17:29:18.770 18 14 0.0 64.5 medium 18
12 2000-10-26 17:29:18.770 18 13 0.0 24.0 low 18 End
13 2000-10-26 17:29:28.770 18 14 0.0 63.5 medium 18
14 2000-10-26 17:29:34.770 19 14 0.0 62.0 medium 19
15 2000-10-26 17:29:40.770 19 14 1.0 61.0 medium 19 Start
16 2000-10-26 17:29:46.770 19 14 1.0 60.0 medium 19 ---- collect `b` value in `list_container` 2
17 2000-10-26 17:32:01.180 19 13 1.0 25.0 low 19
18 2000-10-26 17:32:01.180 19 14 0.0 51.5 low 19
19 2000-10-26 17:32:35.180 19 13 0.0 50.0 medium 19 End
reproducible example:
td1 = pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
1: Timestamp('2000-10-26 10:39:24.310000'),
2: Timestamp('2000-10-26 11:50:48.190000'),
3: Timestamp('2000-10-26 17:18:07.610000'),
4: Timestamp('2000-10-26 17:18:09.610000'),
5: Timestamp('2000-10-26 17:29:10.610000'),
6: Timestamp('2000-10-26 17:29:10.770000'),
7: Timestamp('2000-10-26 17:29:12.610000'),
8: Timestamp('2000-10-26 17:29:14.610000'),
9: Timestamp('2000-10-26 17:29:14.770000'),
10: Timestamp('2000-10-26 17:29:16.610000'),
11: Timestamp('2000-10-26 17:29:18.770000'),
12: Timestamp('2000-10-26 17:29:18.770000'),
13: Timestamp('2000-10-26 17:29:28.770000'),
14: Timestamp('2000-10-26 17:29:34.770000'),
15: Timestamp('2000-10-26 17:29:40.770000'),
16: Timestamp('2000-10-26 17:29:46.770000'),
17: Timestamp('2000-10-26 17:32:01.180000'),
18: Timestamp('2000-10-26 17:32:01.180000'),
19: Timestamp('2000-10-26 17:32:35.180000')},
'X': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19},
'Y': {0: 14,
1: 13,
2: 14,
3: 14,
4: 14,
5: 14,
6: 14,
7: 14,
8: 14,
9: 13,
10: 14,
11: 14,
12: 13,
13: 14,
14: 14,
15: 14,
16: 14,
17: 13,
18: 14,
19: 13},
'a': {0: 0.0,
1: 1.0,
2: 1.0,
3: 1.0,
4: 0.0,
5: 0.0,
6: 1.0,
7: 1.0,
8: 1.0,
9: 1.0,
10: 1.0,
11: 0.0,
12: 0.0,
13: 0.0,
14: 0.0,
15: 1.0,
16: 1.0,
17: 1.0,
18: 0.0,
19: 0.0},
'b': {0: 24.5,
1: 24.0,
2: 23.5,
3: 23.5,
4: 23.5,
5: 26.5,
6: 26.5,
7: 53.5,
8: 62.0,
9: 24.0,
10: 64.5,
11: 64.5,
12: 24.0,
13: 63.5,
14: 62.0,
15: 61.0,
16: 60.0,
17: 25.0,
18: 51.5,
19: 50.0},
'Type': {0: 'medium',
1: 'low',
2: 'medium',
3: 'medium',
4: 'medium',
5: 'medium',
6: 'medium',
7: 'medium',
8: 'medium',
9: 'low',
10: 'medium',
11: 'medium',
12: 'low',
13: 'medium',
14: 'medium',
15: 'medium',
16: 'medium',
17: 'low',
18: 'low',
19: 'medium'},
'Id': {0: 18,
1: 18,
2: 18,
3: 18,
4: 18,
5: 18,
6: 18,
7: 18,
8: 18,
9: 18,
10: 18,
11: 18,
12: 18,
13: 18,
14: 19,
15: 19,
16: 19,
17: 19,
18: 19,
19: 19}})
td1
Expected output:
Start_Id StartTime End_Id EndTime list_container
18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:18.770 [23.5, 23.5, 26.5, 53.5, 62.0, 64.5]
19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0]
I tried the following for-loop, by analysing possible combinations of status before and after each iteration:
def combined_func(td1):
td1['Timestamp'] = pd.to_datetime(td1['Timestamp'])
td1 = td1.sort_values(by=['Id','Timestamp'])
td1 = td1.reset_index(drop=True)
low_on = 0 # Flag to indicate status of low
medium_on = 0 # Flag to indicate status of medium
my_list = []
container_list = []
data = []
time_start = None
start_Id = None
time_end = None
end_Id = None
for i in range(1, len(td1.index)-1):
if (td1.loc[i, 'Id'] == td1.loc[i-1, 'Id']) & (td1.loc[i, 'Id'] == td1.loc[i+1, 'Id']):
if ((not low_on) & (not medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b13 = td1.loc[i, 'b']
my_list.append(b13)
medium_on = 1
time_start = td1.loc[i, 'Timestamp']
start_Id = td1.loc[i, 'Id']
print(f"This is start case 1 (start with medium), start_Id: {start_Id}, time_start: {time_start}")
elif ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
time_start = td1.loc[i, 'Timestamp']
start_Id = td1.loc[i, 'Id']
print(f'This is start case 2 (start with low), start_Id: {start_Id}, time_start: {time_start}')
low_on = 1
else:
continue
elif ((not low_on) & (medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b5 = td1.loc[i, 'b']
my_list.append(b5)
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
low_on = 1
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
b7 = td1.loc[i, 'b']
my_list.append(b7)
list_container = my_list
my_list = []
medium_on = 0
time_end = td1.loc[i, 'Timestamp']
end_Id = td1.loc[i, 'Id']
print(f"This is end case 1 (end with medium), end_Fid: {end_Id}, time_end: {time_end}, container_list is {container_list}")
else:
continue
elif ((low_on) & (not medium_on)):
if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
b11 = td1.loc[i, 'b']
my_list.append(b11)
medium_on = 1
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
time_end = td1.loc[i, 'Timestamp']
end_Id = td1.loc[i, 'Id']
low_on = 0
print(f"This is end case 2 (end with low), end_Id: {end_Id}, time_end: {time_end}, container_list is {my_list}")
else:
continue
elif ((low_on) & (medium_on)):
if (td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium'):
b1 = td1.loc[i, 'b']
my_list.append(b1)
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
low_on = 0
if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
b3 = td1.loc[i, 'b']
my_list.append(b3)
list_container = my_list
my_list = []
medium_on = 0
else:
continue
data.append([start_Id, time_start, end_Id, time_end, list_container])
else:
continue
else:
continue
data_table1 = pd.DataFrame(data, columns= ["Start_Id", "StartTime", "End_Id", "EndTime", "list_container"])
return data_table1
output = combined_func(td1)
output
It returned:
This is start case 2 (start with low), start_Id: 18, time_start: 2000-10-26 10:39:24.310000
This is end case 2 (end with low), end_Id: 18, time_end: 2000-10-26 17:29:18.770000, container_list is []
This is start case 1 (start with medium), start_Id: 19, time_start: 2000-10-26 17:29:40.770000
Start_Id StartTime End_Id EndTime list_container
0 18 2000-10-26 10:39:24.310 None None [23.5, 23.5, 23.5]
1 18 2000-10-26 10:39:24.310 None None [26.5, 53.5, 62.0, 64.5, 64.5]
Somehow End_Id
and EndTime
are missing and list_container values are also off. I am not sure which steps went wrong. Any suggestion is greatly appreciated.
I couldn't find a better way to do this than grouping by X
and creating a specific logic for each of the returned values according your description.
def times(df):
start_time = df.loc[df.a == 1, 'Timestamp'].iloc[0]
end_time = pd.NaT
if(df.loc[df.a == 0, 'Type'].nunique() == 2):
end_time = (
df.loc[df.a == 0, ['Timestamp', 'Type']]
.drop_duplicates('Type', keep='last')
.Timestamp
.iloc[-1]
)
if (pd.notnull([start_time, end_time]).all()):
temp = df[(df.Timestamp > start_time) & (df.Timestamp < end_time)]
start_id, end_id = temp.Id.iloc[[0, -1]].to_list()
list_container = temp[temp.a == 1].b.to_list()
return pd.Series({
'Start_Id': start_id,
'StartTime': start_time,
'End_Id': end_id,
'EndTime': end_time,
'list_container': list_container
})
results = td1.groupby('X').apply(times)
results
# Start_Id StartTime End_Id EndTime list_container
# X
# 18 18 2000-10-26 10:39:24.310 18 2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5]
# 19 19 2000-10-26 17:29:40.770 19 2000-10-26 17:32:35.180 [60.0, 25.0]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.