简体   繁体   中英

Break list of dictionaries into dictionaries for common elements

The problem is a little hard to explain with just the title.

I have a huge list of dictionaries, dict_list, about 18k long. One of the keys on each of them is "PROCESS". The processes are "Etch" and "Depo", and each will repeat for a bit, then change to the other, and back. These are called "runs".

I need to group similar processes together into a list, until the process changes, then insert that list into a "runs" dictionary. Here is a better visual explanation:

dict_list = [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
             {"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
             {"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
             {"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
             {"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
             {"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
             {"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
             {"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
             {"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
             {"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
             {"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
             {"PROCESS": "Etch"},{"PROCESS": "Etch"}]

Basically, if I loop over dict_list , printing each "PROCESS" line by line, it would look like:

>>"Etch"
>>"Etch"
>>"Etch"
>>"Etch"
>>"Depo"
>>"Depo"
>>"Depo"
>>"Depo"
>>"Etch"
>>"Etch"
>>"Etch"
>>"Etch"
>>"Depo"
>>"Depo"
>>"Depo"
>>"Depo"

For that example I would have 4 "runs" dictionaries, each with a list of 4 dictionaries.

I would need to group them into lists and then into dictionaries like such:

new_dict_list = {
    "run 1": [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"}],
    "run 2": [{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"}],
    "run 3": [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"}]
}

It would be something like:

  • Iterate over each dictionary

  • Place the first dictionary in a list, and that list into a new dictionary (we call this a run)

  • On the next iteration, if dictionary["PROCESS"] is the same, store it into the same list and same dictionary

  • If dictionary["PROCESS"] changes, store the current dictionary in a new list and then into a new dictionary

I'm just not sure how to put this into python logic. I'm still newish at this.

This is what I have so far:

prev_process = ""
counter = 0
new_dict_list = {}

for dictionary in dict_list:
  if dictionary["PROCESS"] != prev_process:
    counter += 1
    prev_process = dictionary["PROCESS"]
  new_dict_list["run " + counter] = dictionary

I'm feeling there should be a while loop there, "while dictionary["PROCESS"] remains the same, do stuff", but I don't know how to put that into python, or how to break out (because the condition would always be true if I check it like I am now).

You can use itertools.groupby :

import itertools
dict_list = [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"}]
new_d = {'run {}'.format(i):list(b) for i, [_, b] in enumerate(itertools.groupby(dict_list, key=lambda x:x["PROCESS"]), 1)}

Output:

{'run 1': [{'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}], 
 'run 2': [{'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}], 
 'run 3': [{'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}]
}

itertools.groupby categorizes data based on a single key. In this case, the data is grouped around the value for the 'PROCESS' key, resulting in nested lists, containing the key value, and all elements that have a matching key value. To create the custom 'run {number}' key, enumerate is used to keep track of the current index of iteration in a clean manner.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM