简体   繁体   English

从python的嵌套列表中获取唯一值

[英]Get unique values from a nested list in python

I have a nested list (list of list) and I want to remove the duplicates but I'm getting an error. 我有一个嵌套列表(列表列表),我想删除重复项,但出现错误。 This is an example: 这是一个例子:

images = [
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        }, 
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ],
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        }, 
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ],
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        }, 
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ]
]

So at the final this images will only contains 所以最终,这些images只会包含

[
    [
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "marine-transportation-transports-maritimes.xml"
        }, 
        {
            "image_link": "1969.1523.001.aa.cs.jpg", 
            "catalogue_number": "1969.1523", 
            "dataset_name": "railway-transportation-transports-ferroviaires.xml"
        }
    ]
]

I'm using the set function 我正在使用set函数

set.__doc__
'set() -> new empty set object\nset(iterable) -> new set object\n\nBuild an unor
dered collection of unique elements.'

my trace log: 我的跟踪日志:

list(set(images))
Traceback (most recent call last):
  File "<input>", line 1, in <module>
TypeError: unhashable type: 'list'

To make it simpler how can I remove all the duplicate in this example 为了简化起见,如何删除此示例中的所有重复项

example = [ [{'a':1, 'b':2}, 'w', 2], [{'a':1, 'b':2}, 'w', 2] ]
#result
#example = [[{'a':1, 'b':2}, 'w', 2] ]

Seems like you want something like this, 好像您想要这样的东西,

>>> example = [ [{'a':1, 'b':2}, 'w', 2], [{'a':1, 'b':2}, 'w', 2] ]
>>> l = []
>>> for i in example:
        if i not in l:
            l.append(i)


>>> l
[[{'b': 2, 'a': 1}, 'w', 2]]

The set and dict containers rely on hashing of data. setdict容器依赖于数据散列。 Other mutable containers like list (and the set and dict themselves) cannot be hashed. 其他可变容器(例如list (以及setdict本身))不能进行哈希处理。 They may be changed later on (mutable), so a constant hash value makes no sense. 它们可能会在以后更改(可变),因此恒定的哈希值毫无意义。

But you could transform all your data to (nested) tuples and finally into a set . 但是您可以将所有数据转换为(嵌套的)元组,最后转换为set Since tuple is an immutable container - and your data is hashable ( strings ) - it can work. 由于tuple是一个不变的容器-并且您的数据是可哈希的( 字符串 )-因此它可以工作。 Here's a nasty one-liner for your special images case that does the trick: 这是为您的特殊图像盒设计的一个令人讨厌的内衬:

images_Set = set([tuple([tuple(sorted(image_dict.items())) 
    for image_dict in inner_list])  for inner_list in images])

and

print(images_set)

prints 版画

{((('catalogue_number', '1969.1523'),
   ('dataset_name', 'marine-transportation-transports-maritimes.xml'),
   ('image_link', '1969.1523.001.aa.cs.jpg')),
  (('catalogue_number', '1969.1523'),
   ('dataset_name', 'railway-transportation-transports-ferroviaires.xml'),
   ('image_link', '1969.1523.001.aa.cs.jpg')))}

EDIT : There's no guaranteed order for the items function of dictionaries. 编辑 :字典的items功能没有保证的顺序 Hence, I also added sorted to ensure an order. 因此,我还添加了sorted以确保顺序。

You can use compiler.ast.flatten to flatten your list and then convert your dictionary to a hashable object to grub the sets then convert back to dict , Just with one list comprehension : 您可以使用compiler.ast.flatten来平化列表,然后将字典转换为可哈希的对象以获取集合,然后再转换回dict,只需一个列表即可:

>>> from compiler.ast import flatten
>>> [dict(item) for item in set(tuple(i.items()) for i in flatten(images))]
[{'image_link': '1969.1523.001.aa.cs.jpg', 'catalogue_number': '1969.1523', 'dataset_name': 'marine-transportation-transports-maritimes.xml'}, {'image_link': '1969.1523.001.aa.cs.jpg', 'catalogue_number': '1969.1523', 'dataset_name': 'railway-transportation-transports-ferroviaires.xml'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM