I've 300k+ individual dictionaries from API calls with the format: (1 API call will return 1 dict, so each of the following dict are results of successful consecutive API calls, and after every API call the code needs to run over the returned dict)
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 21, 'Type': 'dwg', 'Size(MB)': 98, 'uid': 732854}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 8, 'Type': 'pdf', 'Size(MB)': 42, 'uid': 735554}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 16, 'Type': 'docx', 'Size(MB)': 104, 'uid': 746748}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 8, 'Type': 'pptx', 'Size(MB)': 57, 'uid': 731024}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAM', 'Files': 8, 'Type': 'dwg', 'Size(MB)': 71, 'uid': 737328}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAM', 'Files': 8, 'Type': 'docx', 'Size(MB)': 22, 'uid': 376494}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MIM', 'Files': 8, 'Type': 'pptx', 'Size(MB)': 28, 'uid': 687281}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MIM', 'Files': 8, 'Type': 'docx', 'Size(MB)': 20, 'uid': 687231}
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MET', 'Files': 20, 'Type': 'pptx', 'Size(MB)': 204, 'uid': 457281}
I've to append the above individual dictionaries into a list of dictionaries with the following conditions:
So out of the above data only following should make it to the final list:
[{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 21, 'Type': 'dwg', 'Size(MB)': 98, 'uid': 732854},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 8, 'Type': 'pdf', 'Size(MB)': 42, 'uid': 735554},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAM', 'Files': 8, 'Type': 'dwg', 'Size(MB)': 71, 'uid': 737328},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MIM', 'Files': 8, 'Type': 'pptx', 'Size(MB)': 28, 'uid': 687281},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MET', 'Files': 20, 'Type': 'pptx', 'Size(MB)': 204, 'uid': 457281},
...]
The code i tried using:
list = []
dict = {'N.': name, 'Batch': year, 'Sem': semester, 'Sub': subject, 'Files': nofiles, 'Type': format, 'Size(MB)': size, 'uid': uniqueid}
comparekeys = ['N.','Batch','Sem','Sub']
nptype = ['docx', 'pptx', 'xlsx']
if dict not in list and format in nptype:
for key in comparekeys:
if dict[key] == (item[key] for item in list):
break
list.append(dict)
The above code also appends the non-preferred formats and is unable to lookup if an entry already exists in the list. I tried with zip(), set(), .keys() too but couldn't formulate the right code.
You said all dict for the same ['N.','Batch','Sem','Sub'] set return together consecutively one after the other in API calls. So I'm going to presume they are grouped together.
Use itertools.groupby()
to process each group of dicts. For each group, sort them so that preferred types are before non-preferred types. Then the first of the sorted dicts is always added to the results because it is either preferred type, or there aren't any preferred types. Of the remaining sorted dicts, only those with a preferred type are appended to the results.
import itertools as it
data = [
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 21, 'Type': 'dwg', 'Size(MB)': 98, 'uid': 732854},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 8, 'Type': 'pdf', 'Size(MB)': 42, 'uid': 735554},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 16, 'Type': 'docx', 'Size(MB)': 104, 'uid': 746748},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAD', 'Files': 8, 'Type': 'pptx', 'Size(MB)': 57, 'uid': 731024},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAM', 'Files': 8, 'Type': 'dwg', 'Size(MB)': 71, 'uid': 737328},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'CAM', 'Files': 8, 'Type': 'docx', 'Size(MB)': 22, 'uid': 376494},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MIM', 'Files': 8, 'Type': 'pptx', 'Size(MB)': 28, 'uid': 687281},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MIM', 'Files': 8, 'Type': 'docx', 'Size(MB)': 20, 'uid': 687231},
{'N.': 'Sam', 'Batch': 2019, 'Sem': 'I', 'Sub': 'MET', 'Files': 20, 'Type': 'pptx', 'Size(MB)': 204, 'uid': 457281},
]
preferred_types = ('dwg', 'pdf', 'bmp')
result = []
key = lambda v:(v['N.'], v['Batch'], v['Sem'], v['Sub'])
for _, values in it.groupby(data, key=key):
values = sorted(values, key=lambda v:v['Type'] not in preferred_types)
result.append(values[0])
result.extend(value for value in values[1:] if value['Type'] in preferred_types)
for row in result:
print(row)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.