简体   繁体   English

从列表中的嵌套字典中删除重复项

[英]remove duplicates from nested dictionaries in list

quick and very basic newbie question. 快速和非常基本的新手问题。

If i have list of dictionaries looking like this: 如果我有这样的词典列表:

L = []
L.append({"value1": value1, "value2": value2, "value3": value3, "value4": value4})

Let's say there exists multiple entries where value3 and value4 are identical to other nested dictionaries. 假设存在多个条目,其中value3和value4与其他嵌套字典相同。 How can i quick and easy find and remove those duplicate dictionaries. 如何快速轻松地找到并删除那些重复的词典。

Preserving order is of no importance. 保持秩序并不重要。

Thanks. 谢谢。

EDIT: 编辑:

If there are five inputs, like this: 如果有五个输入,如下所示:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": sdfsf, "value2": sdfsdf, "value3": abcd, "value4": gk},
    {"value1": asddas, "value2": asdsa, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}]

The output shoud look like this: 输出应该如下所示:

L = [{"value1": fssd, "value2": dsfds, "value3": abcd, "value4": gk},
    {"value1": asdasd, "value2": asdas, "value3": dafdd, "value4": sdfsdf},
    {"value1": asdasd, "value2": dskksks, "value3": ldlsld, "value4": sdlsld}

Here's one way: 这是一种方式:

keyfunc = lambda d: (d['value3'], d['value4'])

from itertools import groupby
giter = groupby(sorted(L, key=keyfunc), keyfunc)

L2 = [g[1].next() for g in giter]
print L2

In Python 2.6 or 3.*: 在Python 2.6或3. *中:

import itertools
import pprint

L = [{"value1": "fssd", "value2": "dsfds", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "asdas", "value3": "dafdd", "value4": "sdfsdf"},
    {"value1": "sdfsf", "value2": "sdfsdf", "value3": "abcd", "value4": "gk"},
    {"value1": "asddas", "value2": "asdsa", "value3": "abcd", "value4": "gk"},
    {"value1": "asdasd", "value2": "dskksks", "value3": "ldlsld", "value4": "sdlsld"}]

getvals = operator.itemgetter('value3', 'value4')

L.sort(key=getvals)

result = []
for k, g in itertools.groupby(L, getvals):
    result.append(g.next())

L[:] = result
pprint.pprint(L)

Almost the same in Python 2.5, except you have to use g.next() instead of next(g) in the append. 在Python 2.5中几乎相同,除了你必须在追加中使用g.next()而不是next(g)。

You can use a temporary array to store an items dict. 您可以使用临时数组来存储项目dict。 The previous code was bugged for removing items in the for loop. 之前的代码被删除了删除for循环中的项目。

(v,r) = ([],[])
for i in l:
    if ('value4', i['value4']) not in v and ('value3', i['value3']) not in v:
        r.append(i)
    v.extend(i.items())
l = r

Your test: 你的考试:

l = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
    {"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
    {"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

ouputs 。OUPUTS

{'value4': 'gk', 'value3': 'abcd', 'value2': 'dsfds', 'value1': 'fssd'}
{'value4': 'sdfsdf', 'value3': 'dafdd', 'value2': 'asdas', 'value1': 'asdasd'}
{'value4': 'sdlsld', 'value3': 'ldlsld', 'value2': 'dskksks', 'value1': 'asdasd'}
for dic in list: 
  for anotherdic in list:
    if dic != anotherdic:
      if dic["value3"] == anotherdic["value3"] or dic["value4"] == anotherdic["value4"]:
        list.remove(anotherdic)

Tested with 经过测试

list = [{"value1": 'fssd', "value2": 'dsfds', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'asdas', "value3": 'dafdd', "value4": 'sdfsdf'},
{"value1": 'sdfsf', "value2": 'sdfsdf', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asddas', "value2": 'asdsa', "value3": 'abcd', "value4": 'gk'},
{"value1": 'asdasd', "value2": 'dskksks', "value3": 'ldlsld', "value4": 'sdlsld'}]

worked fine for me :) 对我来说工作得很好:)

That's a list of one dictionary and but, assuming there are more dictionaries in the list l : 这是一个字典的列表,但是,假设列表中有更多的字典l

l = [ldict for ldict in l if ldict.get("value3") != value3 or ldict.get("value4") != value4]

But is that what you really want to do? 但那是你真正想做的吗? Perhaps you need to refine your description. 也许您需要优化您的描述。

BTW, don't use list as a name since it is the name of a Python built-in. 顺便说一句,不要使用list作为名称,因为它是Python内置的名称。

EDIT: Assuming you started with a list of dictionaries, rather than a list of lists of 1 dictionary each that should work with your example. 编辑:假设您开始使用词典列表,而不是每个应该与您的示例一起使用的1个词典列表。 It wouldn't work if either of the values were None, so better something like: 如果其中任何一个值为None,那么它将无效,所以更好的是:

l = [ldict for ldict in l if not ( ("value3" in ldict and ldict["value3"] == value3) and ("value4" in ldict and ldict["value4"] == value4) )]

But it still seems like an unusual data structure. 但它似乎仍然是一个不寻常的数据结构。

EDIT: no need to use explicit get s. 编辑:无需使用显式get

Also, there are always tradeoffs in solutions. 此外,解决方案总是存在权衡。 Without more info and without actually measuring, it's hard to know which performance tradeoffs are most important for the problem. 如果没有更多信息并且没有实际测量,很难知道哪个性能权衡对于问题最重要。 But, as the Zen sez: "Simple is better than complex". 但是,正如Zen sez所说:“简单比复杂更好”。

If I understand correctly, you want to discard matches that come later in the original list but do not care about the order of the resulting list, so: 如果我理解正确,您想要丢弃原始列表中稍后的匹配但不关心结果列表的顺序,因此:

(Tested with 2.5.2) (经2.5.2测试)

tempDict = {}
for d in L[::-1]:
    tempDict[(d["value3"],d["value4"])] = d
L[:] = tempDict.itervalues()
tempDict = None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM