简体   繁体   中英

Append multiple time-series data in for-loop in Python

I would like to loop through time series data-frame td1 below to find the start and end points of some activities.

Definition of activities: Type column consists of medium and low values(you can see medium and low as two independent time series). For the same X , if a turns 1 for either values of Type (eg., for X ==18 , a becomes 1 while Type==medium or a becomes 1 while Type==low ), it marks the start of the activity, I would like to take down the Id and Timestamp at this timestamp as Start_Id and StartTime respectively;

Once the activity starts, it's in an ongoing status. While the activity is ongoing, if a turns 0 for both Type values (ie., medium and low ), it marks the end of the activity(eg., for X ==18 , a becomes 0 while Type==medium and a becomes 0 while Type==low , following the time series). I would like to take down the Id and Timestamp at this time as End_Id and EndTime respectively.

Finally, collect all b values during each activity if:

  1. Type == medium ; and
  2. a ==1 into a list called list_container .

td1:

    Timestamp                X  Y   a   b       Type    Id
0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium  18  
1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low     18  Start
2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium  18  ---- collect `b` value in `list_container` 1
3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium  18  ---- collect `b` value in `list_container` 1
4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium  18
5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium  18
6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium  18  ---- collect `b` value in `list_container` 1
7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium  18  ---- collect `b` value in `list_container` 1
8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium  18  ---- collect `b` value in `list_container` 1
9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low     18
10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium  18  ---- collect `b` value in `list_container` 1
11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium  18
12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low     18  End
13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium  18
14  2000-10-26 17:29:34.770 19  14  0.0 62.0    medium  19
15  2000-10-26 17:29:40.770 19  14  1.0 61.0    medium  19  Start
16  2000-10-26 17:29:46.770 19  14  1.0 60.0    medium  19  ---- collect `b` value in `list_container` 2
17  2000-10-26 17:32:01.180 19  13  1.0 25.0    low     19
18  2000-10-26 17:32:01.180 19  14  0.0 51.5    low     19
19  2000-10-26 17:32:35.180 19  13  0.0 50.0    medium  19  End

reproducible example:

td1 = pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
  1: Timestamp('2000-10-26 10:39:24.310000'),
  2: Timestamp('2000-10-26 11:50:48.190000'),
  3: Timestamp('2000-10-26 17:18:07.610000'),
  4: Timestamp('2000-10-26 17:18:09.610000'),
  5: Timestamp('2000-10-26 17:29:10.610000'),
  6: Timestamp('2000-10-26 17:29:10.770000'),
  7: Timestamp('2000-10-26 17:29:12.610000'),
  8: Timestamp('2000-10-26 17:29:14.610000'),
  9: Timestamp('2000-10-26 17:29:14.770000'),
  10: Timestamp('2000-10-26 17:29:16.610000'),
  11: Timestamp('2000-10-26 17:29:18.770000'),
  12: Timestamp('2000-10-26 17:29:18.770000'),
  13: Timestamp('2000-10-26 17:29:28.770000'),
  14: Timestamp('2000-10-26 17:29:34.770000'),
  15: Timestamp('2000-10-26 17:29:40.770000'),
  16: Timestamp('2000-10-26 17:29:46.770000'),
  17: Timestamp('2000-10-26 17:32:01.180000'),
  18: Timestamp('2000-10-26 17:32:01.180000'),
  19: Timestamp('2000-10-26 17:32:35.180000')},
 'X': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19},
 'Y': {0: 14,
  1: 13,
  2: 14,
  3: 14,
  4: 14,
  5: 14,
  6: 14,
  7: 14,
  8: 14,
  9: 13,
  10: 14,
  11: 14,
  12: 13,
  13: 14,
  14: 14,
  15: 14,
  16: 14,
  17: 13,
  18: 14,
  19: 13},
 'a': {0: 0.0,
  1: 1.0,
  2: 1.0,
  3: 1.0,
  4: 0.0,
  5: 0.0,
  6: 1.0,
  7: 1.0,
  8: 1.0,
  9: 1.0,
  10: 1.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 1.0,
  16: 1.0,
  17: 1.0,
  18: 0.0,
  19: 0.0},
 'b': {0: 24.5,
  1: 24.0,
  2: 23.5,
  3: 23.5,
  4: 23.5,
  5: 26.5,
  6: 26.5,
  7: 53.5,
  8: 62.0,
  9: 24.0,
  10: 64.5,
  11: 64.5,
  12: 24.0,
  13: 63.5,
  14: 62.0,
  15: 61.0,
  16: 60.0,
  17: 25.0,
  18: 51.5,
  19: 50.0},
 'Type': {0: 'medium',
  1: 'low',
  2: 'medium',
  3: 'medium',
  4: 'medium',
  5: 'medium',
  6: 'medium',
  7: 'medium',
  8: 'medium',
  9: 'low',
  10: 'medium',
  11: 'medium',
  12: 'low',
  13: 'medium',
  14: 'medium',
  15: 'medium',
  16: 'medium',
  17: 'low',
  18: 'low',
  19: 'medium'},
 'Id': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19}})

td1

Expected output:

Start_Id  StartTime                End_Id      EndTime                  list_container
18        2000-10-26 10:39:24.310  18          2000-10-26 17:29:18.770  [23.5, 23.5, 26.5, 53.5, 62.0, 64.5]
19        2000-10-26 17:29:40.770  19          2000-10-26 17:32:35.180  [60.0]

I tried the following for-loop, by analysing possible combinations of status before and after each iteration:

def combined_func(td1):

    td1['Timestamp'] = pd.to_datetime(td1['Timestamp'])
    td1 = td1.sort_values(by=['Id','Timestamp'])
    td1 = td1.reset_index(drop=True)

    low_on = 0     # Flag to indicate status of low
    medium_on = 0  # Flag to indicate status of medium
    my_list = []
    container_list = []
    data = []
    time_start = None
    start_Id = None
    time_end = None  
    end_Id = None  

    for i in range(1, len(td1.index)-1):

        if  (td1.loc[i, 'Id'] == td1.loc[i-1, 'Id']) & (td1.loc[i, 'Id'] == td1.loc[i+1, 'Id']): 
            
            if ((not low_on) & (not medium_on)):
                if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
                    b13 = td1.loc[i, 'b']
                    my_list.append(b13)
                    medium_on = 1

                    time_start = td1.loc[i, 'Timestamp']
                    start_Id =  td1.loc[i, 'Id']
                    print(f"This is start case 1 (start with medium), start_Id: {start_Id}, time_start: {time_start}")

                elif ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):

                    time_start = td1.loc[i, 'Timestamp']
                    start_Id =  td1.loc[i, 'Id']

                    print(f'This is start case 2 (start with low), start_Id: {start_Id}, time_start: {time_start}')
                    low_on = 1

                else:
                    continue

            elif ((not low_on) & (medium_on)):     
                if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
                    b5 = td1.loc[i, 'b']
                    my_list.append(b5)

                if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'low')):
                    low_on = 1

                if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
                    b7 = td1.loc[i, 'b']

                    my_list.append(b7)
                    list_container = my_list
                    my_list = []
                    medium_on = 0

                    time_end = td1.loc[i, 'Timestamp']
                    end_Id =  td1.loc[i, 'Id']
                    
                    print(f"This is end case 1 (end with medium), end_Fid: {end_Id}, time_end: {time_end}, container_list is {container_list}")

                else:
                    continue

            elif ((low_on) & (not medium_on)):
                if ((td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium')):
                    b11 = td1.loc[i, 'b']
                    my_list.append(b11)
                    medium_on = 1
        
                if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
                    
                    time_end = td1.loc[i, 'Timestamp']
                    end_Id =  td1.loc[i, 'Id']
                    
                    low_on = 0
                    print(f"This is end case 2 (end with low), end_Id: {end_Id}, time_end: {time_end}, container_list is {my_list}")

                else:
                    continue

            elif ((low_on) & (medium_on)):

                if (td1.loc[i, 'a'] == 1) & (td1.loc[i, 'Type'] == 'medium'):
                    b1 = td1.loc[i, 'b']
                    my_list.append(b1)

                if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'low')):
                    low_on = 0

                if ((td1.loc[i, 'a'] == 0) & (td1.loc[i, 'Type'] == 'medium')):
                    b3 = td1.loc[i, 'b']
                    my_list.append(b3)
                    list_container = my_list
                    my_list = []
                    medium_on = 0

                else:
                    continue

                data.append([start_Id, time_start, end_Id, time_end,  list_container])

            else:
                continue
        else:
            continue

    data_table1 = pd.DataFrame(data, columns= ["Start_Id", "StartTime",  "End_Id", "EndTime", "list_container"])
    
    return data_table1

output = combined_func(td1)
output

It returned:

This is start case 2 (start with low), start_Id: 18, time_start: 2000-10-26 10:39:24.310000
This is end case 2 (end with low), end_Id: 18, time_end: 2000-10-26 17:29:18.770000, container_list is []
This is start case 1 (start with medium), start_Id: 19, time_start: 2000-10-26 17:29:40.770000

    Start_Id    StartTime                   End_Id  EndTime list_container
0   18          2000-10-26 10:39:24.310     None    None    [23.5, 23.5, 23.5]
1   18          2000-10-26 10:39:24.310     None    None    [26.5, 53.5, 62.0, 64.5, 64.5]

Somehow End_Id and EndTime are missing and list_container values are also off. I am not sure which steps went wrong. Any suggestion is greatly appreciated.

I couldn't find a better way to do this than grouping by X and creating a specific logic for each of the returned values according your description.

def times(df):
    
    start_time = df.loc[df.a == 1, 'Timestamp'].iloc[0]
    end_time = pd.NaT
    
    if(df.loc[df.a == 0, 'Type'].nunique() == 2):
        end_time = (
            df.loc[df.a == 0, ['Timestamp', 'Type']]
            .drop_duplicates('Type', keep='last')
            .Timestamp
            .iloc[-1]
        )
        
    if (pd.notnull([start_time, end_time]).all()):
        temp = df[(df.Timestamp > start_time) & (df.Timestamp < end_time)]
        start_id, end_id = temp.Id.iloc[[0, -1]].to_list()
        list_container = temp[temp.a == 1].b.to_list()
        
        return pd.Series({
            'Start_Id': start_id,
            'StartTime': start_time,
            'End_Id': end_id,
            'EndTime': end_time,
            'list_container': list_container
        })
    
results = td1.groupby('X').apply(times)
results

#       Start_Id  StartTime                 End_Id  EndTime                 list_container
# X                 
# 18    18        2000-10-26 10:39:24.310   18      2000-10-26 17:29:28.770 [23.5, 23.5, 26.5, 53.5, 62.0, 24.0, 64.5]
# 19    19        2000-10-26 17:29:40.770   19      2000-10-26 17:32:35.180 [60.0, 25.0]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM