简体   繁体   English

从字典列表中删除重复项(具有唯一值)

[英]Remove duplicates from the list of dictionaries (with a unique value)

I have a list of dictionaries each of them describing a file (file format, filename, filesize, ... and a full path to the file [ always unique ]). 我有一个字典列表,每个字典描述一个文件(文件格式,文件名,文件大小,...和文件的完整路径[ 始终唯一 ])。 The goal is to exclude all but one dictionaries describing copies of the same file (I just want a single dict (entry) per file, no matter how many copies there are. 目标是排除描述同一文件副本的所有字典 (我只想要每个文件有一个字典(条目),无论有多少副本。

In other words: if 2 (or more) dicts differ only in a single key ( ie path ) - leave only one of them). 换句话说:如果2个(或更多)dicts仅在单个键( 即路径 )上有所不同 - 只留下其中一个)。

For example, here is the source list: 例如,这是源列表:

src_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]

The result should look like this: 结果应如下所示:

dst_list = [{'filename': 'abc', 'filetype': '.txt', ... 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', ... 'path': 'C:/mydir2'}]

Use another dictionary to map the dictionaries from the list without the "ignored" keys to the actual dictionaries. 使用另一个字典将列表中的字典映射到实际字典而不使用“忽略”键。 This way, only one of each kind will be retained. 这样,每种类型中只保留一种。 Of course, dicts are not hashable, so you have to use (sorted) tuples instead. 当然,dicts不可清除,所以你必须使用(排序)元组。

src_list = [{'filename': 'abc', 'filetype': '.txt', 'path': 'C:/'},
            {'filename': 'abc', 'filetype': '.txt', 'path': 'C:/mydir'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/'},
            {'filename': 'def', 'filetype': '.zip', 'path': 'C:/mydir2'}]
ignored_keys = ["path"]
filtered = {tuple((k, d[k]) for k in sorted(d) if k not in ignored_keys): d for d in src_list}
dst_lst = list(filtered.values())

Result is: 结果是:

[{'path': 'C:/mydir', 'filetype': '.txt', 'filename': 'abc'}, 
 {'path': 'C:/mydir2', 'filetype': '.zip', 'filename': 'def'}]

My own solution (maybe not the best, but it worked): 我自己的解决方案(可能不是最好的,但它有效):

    dst_list = []
    seen_items = set()
    for dictionary in src_list:
        # here we cut the unique key (path) out to add it back later after a duplicate check
        path = dictionary.pop('path', None)
        t = tuple(dictionary.items())
        if t not in seen_items:
            seen_items.add(t)
            # duplicate-check passed, adding the unique key back to it's dictionry
            dictionary['path'] = path
            dst_list.append(dictionary)

    print(dst_list) 

Where 哪里

src_list is the original list with possible duplicates, src_list是可能重复的原始列表,

dst_list is the final duplicate-free list, dst_list是最终的无重复列表,

path is the unique key path是唯一的密钥

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM