I have a list of dictionaries each of them describing a file (file format, filename, filesize, ... and a full path to the file [ always unique ]). The goal is to exclude all but one dictionaries describing copies of the same file (I just want a single dict (entry) per file, no matter how many copies there are.
In other words: if 2 (or more) dicts differ only in a single key ( ie path ) - leave only one of them).
For example, here is the source list:
src_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/mydir'},
{'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/'},
{'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]
The result should look like this:
dst_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
{'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]
Use another dictionary to map the dictionaries from the list without the "ignored" keys to the actual dictionaries. This way, only one of each kind will be retained. Of course, dicts are not hashable, so you have to use (sorted) tuples instead.
src_list = [{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/'},
{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/mydir'},
{'filename': 'def', 'filetype': '.zip', 'path': 'C:/'},
{'filename': 'def', 'filetype': '.zip', 'path': 'C:/mydir2'}]
ignored_keys = ["path"]
filtered = {tuple((k, d[k]) for k in sorted(d) if k not in ignored_keys): d for d in src_list}
dst_lst = list(filtered.values())
Result is:
[{'path': 'C:/mydir', 'filetype': '.txt', 'filename': 'abc'},
{'path': 'C:/mydir2', 'filetype': '.zip', 'filename': 'def'}]
My own solution (maybe not the best, but it worked):
dst_list = []
seen_items = set()
for dictionary in src_list:
# here we cut the unique key (path) out to add it back later after a duplicate check
path = dictionary.pop('path', None)
t = tuple(dictionary.items())
if t not in seen_items:
seen_items.add(t)
# duplicate-check passed, adding the unique key back to it's dictionry
dictionary['path'] = path
dst_list.append(dictionary)
print(dst_list)
Where
src_list
is the original list with possible duplicates,
dst_list
is the final duplicate-free list,
path
is the unique key
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.