简体   繁体   English

唯一词典列表

[英]List of unique dictionaries

Let's say I have a list of dictionaries:假设我有一个字典列表:

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

How can I obtain a list of unique dictionaries (removing the duplicates)?如何获得唯一词典列表(删除重复项)?

[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]

So make a temporary dict with the key being the id .所以做一个临时的字典,键是id This filters out the duplicates.这会过滤掉重复项。 The values() of the dict will be the list字典的values()将是列表

In Python2.7在 Python2.7 中

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ]
>>> {v['id']:v for v in L}.values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python3在 Python3 中

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> list({v['id']:v for v in L}.values())
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

In Python2.5/2.6在 Python2.5/2.6 中

>>> L=[
... {'id':1,'name':'john', 'age':34},
... {'id':1,'name':'john', 'age':34},
... {'id':2,'name':'hanna', 'age':30},
... ] 
>>> dict((v['id'],v) for v in L).values()
[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

The usual way to find just the common elements in a set is to use Python's set class.仅在集合中查找公共元素的常用方法是使用 Python 的set类。 Just add all the elements to the set, then convert the set to a list , and bam the duplicates are gone.只需将所有元素添加到集合中,然后将集合转换为list ,重复项就消失了。

The problem, of course, is that a set() can only contain hashable entries, and a dict is not hashable.当然,问题是set()只能包含可散列的条目,而dict不可散列。

If I had this problem, my solution would be to convert each dict into a string that represents the dict , then add all the strings to a set() then read out the string values as a list() and convert back to dict .如果我遇到这个问题,我的解决方案是将每个dict转换为表示dict的字符串,然后将所有字符串添加到set()然后将字符串值作为list()读出并转换回dict

A good representation of a dict in string form is JSON format.字符串形式的dict一个很好的表示是 JSON 格式。 And Python has a built-in module for JSON (called json of course). Python 有一个内置的 JSON 模块(当然叫json )。

The remaining problem is that the elements in a dict are not ordered, and when Python converts the dict to a JSON string, you might get two JSON strings that represent equivalent dictionaries but are not identical strings.剩下的问题是dict中的元素没有排序,当 Python 将dict转换为 JSON 字符串时,您可能会得到两个 JSON 字符串,它们表示等效的字典但不是相同的字符串。 The easy solution is to pass the argument sort_keys=True when you call json.dumps() .简单的解决方案是在调用json.dumps()时传递参数sort_keys=True

EDIT: This solution was assuming that a given dict could have any part different.编辑:此解决方案假设给定的dict可以有任何不同的部分。 If we can assume that every dict with the same "id" value will match every other dict with the same "id" value, then this is overkill;如果我们可以假设,每个dict具有相同"id"值将每隔匹配dict以相同的"id"值,那么这是矫枉过正; @gnibbler's solution would be faster and easier. @gnibbler 的解决方案会更快更容易。

EDIT: Now there is a comment from André Lima explicitly saying that if the ID is a duplicate, it's safe to assume that the whole dict is a duplicate.编辑:现在有来自 André Lima 的评论明确表示,如果 ID 是重复的,则可以安全地假设整个dict是重复的。 So this answer is overkill and I recommend @gnibbler's answer.所以这个答案有点矫枉过正,我推荐@gnibbler的答案。

In case the dictionaries are only uniquely identified by all items (ID is not available) you can use the answer using JSON.如果词典仅由所有项目唯一标识(ID 不可用),您可以使用 JSON 使用答案。 The following is an alternative that does not use JSON, and will work as long as all dictionary values are immutable以下是不使用 JSON 的替代方法,只要所有字典值都是不可变的,它就会起作用

[dict(s) for s in set(frozenset(d.items()) for d in L)]

Here's a reasonably compact solution, though I suspect not particularly efficient (to put it mildly):这是一个相当紧凑的解决方案,尽管我怀疑不是特别有效(说得客气一点):

>>> ds = [{'id':1,'name':'john', 'age':34},
...       {'id':1,'name':'john', 'age':34},
...       {'id':2,'name':'hanna', 'age':30}
...       ]
>>> map(dict, set(tuple(sorted(d.items())) for d in ds))
[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

You can use numpy library (works for Python2.x only):您可以使用 numpy 库(仅适用于 Python2.x):

   import numpy as np 

   list_of_unique_dicts=list(np.unique(np.array(list_of_dicts)))

To get it worked with Python 3.x (and recent versions of numpy), you need to convert array of dicts to numpy array of strings, eg要使其与 Python 3.x(以及 numpy 的最新版本)一起使用,您需要将 dicts 数组转换为 numpy 字符串数组,例如

list_of_unique_dicts=list(np.unique(np.array(list_of_dicts).astype(str)))
a = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

b = {x['id']:x for x in a}.values()

print(b)

outputs:输出:

[{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}] [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Since the id is sufficient for detecting duplicates, and the id is hashable: run 'em through a dictionary that has the id as the key.由于id足以检测重复项,并且id是可散列的:通过以id作为键的字典运行它们。 The value for each key is the original dictionary.每个键的值是原始字典。

deduped_dicts = dict((item["id"], item) for item in list_of_dicts).values()

In Python 3, values() doesn't return a list;在 Python 3 中, values()不返回列表; you'll need to wrap the whole right-hand-side of that expression in list() , and you can write the meat of the expression more economically as a dict comprehension:您需要将该表达式的整个右侧都包含在list() ,并且您可以更经济地将表达式的内容编写为 dict 理解:

deduped_dicts = list({item["id"]: item for item in list_of_dicts}.values())

Note that the result likely will not be in the same order as the original.请注意,结果的顺序可能与原始顺序不同。 If that's a requirement, you could use a Collections.OrderedDict instead of a dict .如果这是一个要求,您可以使用Collections.OrderedDict而不是dict

As an aside, it may make a good deal of sense to just keep the data in a dictionary that uses the id as key to begin with.顺便说一句,它可能使的感觉很划算,只是保持数据在使用的字典id为重点开始。

We can do with pandas我们可以用pandas

import pandas as pd
yourdict=pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[293]: [{'age': 34, 'id': 1, 'name': 'john'}, {'age': 30, 'id': 2, 'name': 'hanna'}]

Notice slightly different from the accept answer.注意与接受答案略有不同。

drop_duplicates will check all column in pandas , if all same then the row will be dropped . drop_duplicates将检查 pandas 中的所有列,如果都相同,则该行将被删除。

For example :例如 :

If we change the 2nd dict name from john to peter如果我们将第二个dict名称从john更改为peter

L=[
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'peter', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
pd.DataFrame(L).drop_duplicates().to_dict('r')
Out[295]: 
[{'age': 34, 'id': 1, 'name': 'john'},
 {'age': 34, 'id': 1, 'name': 'peter'},# here will still keeping the dict in the out put 
 {'age': 30, 'id': 2, 'name': 'hanna'}]

Expanding on John La Rooy ( Python - List of unique dictionaries ) answer, making it a bit more flexible:扩展 John La Rooy( Python - List of unique Dictionaries )答案,使其更加灵活:

def dedup_dict_list(list_of_dicts: list, columns: list) -> list:
    return list({''.join(row[column] for column in columns): row
                for row in list_of_dicts}.values())

Calling Function:调用函数:

sorted_list_of_dicts = dedup_dict_list(
    unsorted_list_of_dicts, ['id', 'name'])

I don't know if you only want the id of your dicts in the list to be unique, but if the goal is to have a set of dict where the unicity is on all keys' values.. you should use tuples key like this in your comprehension :我不知道您是否只希望列表中的 dict 的 id 是唯一的,但是如果目标是拥有一组 dict,其中所有键的值都具有唯一性..您应该像这样使用元组键在你的理解中:

>>> L=[
...     {'id':1,'name':'john', 'age':34},
...    {'id':1,'name':'john', 'age':34}, 
...    {'id':2,'name':'hanna', 'age':30},
...    {'id':2,'name':'hanna', 'age':50}
...    ]
>>> len(L)
4
>>> L=list({(v['id'], v['age'], v['name']):v for v in L}.values())
>>>L
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}, {'id': 2, 'name': 'hanna', 'age': 50}]
>>>len(L)
3

Hope it helps you or another person having the concern....希望它可以帮助您或其他有顾虑的人......

There are a lot of answers here, so let me add another:这里有很多答案,所以让我补充一个:

import json
from typing import List

def dedup_dicts(items: List[dict]):
    dedupped = [ json.loads(i) for i in set(json.dumps(item, sort_keys=True) for item in items)]
    return dedupped

items = [
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 1, 'name': 'john', 'age': 34},
    {'id': 2, 'name': 'hanna', 'age': 30},
]
dedup_dicts(items)

I have summarized my favorites to try out:我总结了我最喜欢的尝试:

https://repl.it/@SmaMa/Python-List-of-unique-dictionaries https://repl.it/@SmaMa/Python-List-of-unique-dictionaries

# ----------------------------------------------
# Setup
# ----------------------------------------------

myList = [
  {"id":"1", "lala": "value_1"},
  {"id": "2", "lala": "value_2"}, 
  {"id": "2", "lala": "value_2"}, 
  {"id": "3", "lala": "value_3"}
]
print("myList:", myList)

# -----------------------------------------------
# Option 1 if objects has an unique identifier
# -----------------------------------------------

myUniqueList = list({myObject['id']:myObject for myObject in myList}.values())
print("myUniqueList:", myUniqueList)

# -----------------------------------------------
# Option 2 if uniquely identified by whole object
# -----------------------------------------------

myUniqueSet = [dict(s) for s in set(frozenset(myObject.items()) for myObject in myList)]
print("myUniqueSet:", myUniqueSet)

# -----------------------------------------------
# Option 3 for hashable objects (not dicts)
# -----------------------------------------------

myHashableObjects = list(set(["1", "2", "2", "3"]))
print("myHashAbleList:", myHashableObjects)

In python 3, simple trick, but based on unique field (id):在python 3中,简单的技巧,但基于唯一字段(id):

data = [ {'id': 1}, {'id': 1}]

list({ item['id'] : item for item in data}.values())

If there is not a unique id in the dictionaries, then I'd keep it simple and define a function as follows:如果字典中没有唯一的id ,那么我会保持简单并定义一个 function 如下:

def unique(sequence):
    result = []
    for item in sequence:
        if item not in result:
            result.append(item)
    return result

The advantage with this approach, is that you can reuse this function for any comparable objects.这种方法的优点是,您可以将此 function 重复用于任何可比较的对象。 It makes your code very readable, works in all modern versions of Python, preserves the order in the dictionaries, and is fast too compared to its alternatives.它使您的代码非常可读,适用于 Python 的所有现代版本,保留字典中的顺序,并且与其替代方案相比也很快。

>>> L = [
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 1, 'name': 'john', 'age': 34},
... {'id': 2, 'name': 'hanna', 'age': 30},
... ] 
>>> unique(L)
[{'id': 1, 'name': 'john', 'age': 34}, {'id': 2, 'name': 'hanna', 'age': 30}]

In python 3.6+ (what I've tested), just use:在 python 3.6+(我测试过的)中,只需使用:

import json

#Toy example, but will also work for your case 
myListOfDicts = [{'a':1,'b':2},{'a':1,'b':2},{'a':1,'b':3}]
#Start by sorting each dictionary by keys
myListOfDictsSorted = [sorted(d.items()) for d in myListOfDicts]

#Using json methods with set() to get unique dict
myListOfUniqueDicts = list(map(json.loads,set(map(json.dumps, myListOfDictsSorted))))

print(myListOfUniqueDicts)

Explanation: we're mapping the json.dumps to encode the dictionaries as json objects, which are immutable.说明:我们正在映射json.dumps以将字典编码为不可变的 json 对象。 set can then be used to produce an iterable of unique immutables.然后可以使用set来生成唯一不可变的可迭代对象。 Finally, we convert back to our dictionary representation using json.loads .最后,我们使用json.loads转换回我们的字典表示。 Note that initially, one must sort by keys to arrange the dictionaries in a unique form.请注意,一开始,必须按关键字排序,以便以独特的形式排列字典。 This is valid for Python 3.6+ since dictionaries are ordered by default.这对 Python 3.6+ 有效,因为字典是默认排序的。

A quick-and-dirty solution is just by generating a new list.一个快速而肮脏的解决方案就是生成一个新列表。

sortedlist = []

for item in listwhichneedssorting:
    if item not in sortedlist:
        sortedlist.append(item)

Well all the answers mentioned here are good, but in some answers one can face error if the dictionary items have nested list or dictionary, so I propose simple answer好吧,这里提到的所有答案都很好,但是在某些答案中,如果字典项具有嵌套列表或字典,则可能会遇到错误,因此我提出简单的答案

a = [str(i) for i in a]
a = list(set(a))
a = [eval(i) for i in a]

Objects can fit into sets.对象可以放入集合中。 You can work with objects instead of dicts and if needed after all set insertions convert back to a list of dicts.您可以使用对象而不是字典,如果需要,在所有集合插入转换回字典列表之后。 Example例子

class Person:
    def __init__(self, id, age, name):
        self.id = id
        self.age = age
        self.name = name

my_set = {Person(id=2, age=3, name='Jhon')}

my_set.add(Person(id=3, age=34, name='Guy'))

my_set.add({Person(id=2, age=3, name='Jhon')})

# if needed convert to list of dicts
list_of_dict = [{'id': obj.id,
                 'name': obj.name,
                 'age': obj.age} for obj in my_set]

Let me add mine.让我加上我的。

  1. sort target dict so that {'a' : 1, 'b': 2} and {'b': 2, 'a': 1} are not treated differently排序目标字典,以便 {'a' : 1, 'b': 2} 和​​ {'b': 2, 'a': 1} 不被区别对待

  2. make it as json将其设为 json

  3. deduplicate via set (as set does not apply to dicts)通过 set 进行重复数据删除(因为 set 不适用于 dicts)

  4. again, turn it into dict via json.loads再次,通过 json.loads 将其转换为 dict

import json

[json.loads(i) for i in set([json.dumps(i) for i in [dict(sorted(i.items())) for i in target_dict]])]

Pretty straightforward option:非常简单的选项:

L = [
    {'id':1,'name':'john', 'age':34},
    {'id':1,'name':'john', 'age':34},
    {'id':2,'name':'hanna', 'age':30},
    ]


D = dict()
for l in L: D[l['id']] = l
output = list(D.values())
print output

Heres an implementation with little memory overhead at the cost of not being as compact as the rest.这是一个内存开销很小的实现,代价是不像其他实现那么紧凑。

values = [ {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},
           {'id':1,'name':'john', 'age':34},
           {'id':2,'name':'hanna', 'age':30},
           {'id':1,'name':'john', 'age':34},]
count = {}
index = 0
while index < len(values):
    if values[index]['id'] in count:
        del values[index]
    else:
        count[values[index]['id']] = 1
        index += 1

output:输出:

[{'age': 30, 'id': 2, 'name': 'hanna'}, {'age': 34, 'id': 1, 'name': 'john'}]

This is the solution I found:这是我找到的解决方案:

usedID = []

x = [
{'id':1,'name':'john', 'age':34},
{'id':1,'name':'john', 'age':34},
{'id':2,'name':'hanna', 'age':30},
]

for each in x:
    if each['id'] in usedID:
        x.remove(each)
    else:
        usedID.append(each['id'])

print x

Basically you check if the ID is present in the list, if it is, delete the dictionary, if not, append the ID to the list基本上你检查ID是否存在于列表中,如果存在,删除字典,如果没有,将ID附加到列表中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM