简体   繁体   中英

How to compare consecutive elements in a list of dictionaries and merge elements based on a condition?

I have a list of dictionary elements we'll call "sessions", in each session, there is a list of "parts", a session mode and time stamps:

sessions_arr = [
    {'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
    {'parts': [3, 4], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706303},
    {'parts': [5, 6], 'session_mode': 'Idling', 'TS_start': 1632706304, 'TS_end': 1632706400},
    {'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]

If consecutive sessions have the same "session_mode" then the parts in the matching sessions need to merge into the same element, so I want the output to look like this:

sessions_arr = [ 
    {'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
    {'parts': [3, 4, 5, 6], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706400},
    {'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]

Notice how the TS_end of the 2nd index of the array was also updated accordingly. I also want to always merge back and only if the elements are consecutive.

This is what I have so far:

for i in range(len(sessions_arr_copy) - 1):
    if sessions_arr[i]['session_mode'] == sessions_arr[i + 1]['session_mode']:
        # if they match move that session into the one before it
        sessions_arr[i]['parts'].extend(sessions_arr[i+1]['parts'])
        sessions_arr[i]['TS_end'] = sessions_arr[i+1]['TS_end']
        sessions_arr.pop(i+1)

The issue with this implementation is that when I go to pop the element that I just merged into the previous element, it changes the size of the list that I am comparing through. I know this is an IndexError and I understand why this error is occuring. I just want to know how to go about working around this. I would like to do this with only one for loop as the size of the list can get pretty big but it doesn't have to be the fastest algorithm either.

I would break this into two parts:

  1. A function that knows how to merge a set of dicts to create a singe merged dict.

  2. itertools.groupby to group the list on the key you want.

Together that might look something like:

from itertools import groupby

def merge(dicts):
    merged_parts = [part for line in dicts for part in line['parts']]
    start = dicts[0] ['TS_start']
    end = dicts[-1]['TS_end']

    return {
        'parts': merged_parts, 
        'session_mode':dicts[0]['session_mode'], 
        'TS_start': start,
        'TS_end': end
    }
                    

sessions_arr = [
    {'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
    {'parts': [3, 4], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706303},
    {'parts': [5, 6], 'session_mode': 'Idling', 'TS_start': 1632706303, 'TS_end': 1632706400},
    {'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500},
]

[merge(list(g)) for k, g, in groupby(sessions_arr, key=lambda d: d['session_mode'])]

This will leave you with a new list looking like:

[
  {'parts': [1, 2], 'session_mode': 'Driving','TS_start': 1632705871,'TS_end': 1632706202},
  {'parts': [3, 4, 5, 6], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706400},
  {'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]

If your groups are large, you could improve this by not requiring the creation of the temp list(g) and making the merge() function just accept an iterator.

The problem occurs when you try to change the list during iteration. I would use a new array and form the attributes inside.

sessions_arr = [
    {'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
    {'parts': [3, 4], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706303},
    {'parts': [5, 6], 'session_mode': 'Idling', 'TS_start': 1632706303, 'TS_end': 1632706400},
    {'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}
]
    
helper = []
idx = 0
is_consecutive = lambda sess1, sess2: (
    sess1['session_mode'] == sess2['session_mode'] and 
    sess1['TS_end'] == sess2['TS_start']
)
helper.append(sessions_arr[0])
for item in sessions_arr[1:]:
    if is_consecutive(helper[idx], item):
        helper[idx]['parts'].extend(item['parts'])
        helper[idx]['TS_end'] = item['TS_end']
    else:
        helper.append(item)
        idx += 1

print(helper)

Output:

[{'parts': [1, 2], 'session_mode': 'Driving', 'TS_start': 1632705871, 'TS_end': 1632706202},
 {'parts': [3, 4, 5, 6], 'session_mode': 'Idling', 'TS_start': 1632706203, 'TS_end': 1632706400},
 {'parts': [7], 'session_mode': 'Driving', 'TS_start': 1632706401, 'TS_end': 1632706500}]
> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM