The problem is a little hard to explain with just the title.
I have a huge list of dictionaries, dict_list, about 18k long. One of the keys on each of them is "PROCESS". The processes are "Etch" and "Depo", and each will repeat for a bit, then change to the other, and back. These are called "runs".
I need to group similar processes together into a list, until the process changes, then insert that list into a "runs" dictionary. Here is a better visual explanation:
dict_list = [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},
{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},
{"PROCESS": "Etch"},{"PROCESS": "Etch"}]
Basically, if I loop over dict_list
, printing each "PROCESS" line by line, it would look like:
>>"Etch"
>>"Etch"
>>"Etch"
>>"Etch"
>>"Depo"
>>"Depo"
>>"Depo"
>>"Depo"
>>"Etch"
>>"Etch"
>>"Etch"
>>"Etch"
>>"Depo"
>>"Depo"
>>"Depo"
>>"Depo"
For that example I would have 4 "runs" dictionaries, each with a list of 4 dictionaries.
I would need to group them into lists and then into dictionaries like such:
new_dict_list = {
"run 1": [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"}],
"run 2": [{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"}],
"run 3": [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"}]
}
It would be something like:
Iterate over each dictionary
Place the first dictionary in a list, and that list into a new dictionary (we call this a run)
On the next iteration, if dictionary["PROCESS"] is the same, store it into the same list and same dictionary
If dictionary["PROCESS"] changes, store the current dictionary in a new list and then into a new dictionary
I'm just not sure how to put this into python logic. I'm still newish at this.
This is what I have so far:
prev_process = ""
counter = 0
new_dict_list = {}
for dictionary in dict_list:
if dictionary["PROCESS"] != prev_process:
counter += 1
prev_process = dictionary["PROCESS"]
new_dict_list["run " + counter] = dictionary
I'm feeling there should be a while loop there, "while dictionary["PROCESS"] remains the same, do stuff", but I don't know how to put that into python, or how to break out (because the condition would always be true if I check it like I am now).
You can use itertools.groupby
:
import itertools
dict_list = [{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Depo"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"},{"PROCESS": "Etch"}]
new_d = {'run {}'.format(i):list(b) for i, [_, b] in enumerate(itertools.groupby(dict_list, key=lambda x:x["PROCESS"]), 1)}
Output:
{'run 1': [{'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}],
'run 2': [{'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}, {'PROCESS': 'Depo'}],
'run 3': [{'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}, {'PROCESS': 'Etch'}]
}
itertools.groupby
categorizes data based on a single key. In this case, the data is grouped around the value for the 'PROCESS'
key, resulting in nested lists, containing the key value, and all elements that have a matching key value. To create the custom 'run {number}'
key, enumerate
is used to keep track of the current index of iteration in a clean manner.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.