简体   繁体   English

如果其中一个值不是唯一的,从字典列表中删除元素的pythonic方法是什么?

[英]What's the pythonic way to remove element from a list of dict if ONE of the values is not unique?

What would be the pythonic way to remove elements that are not uniques for certain keys?删除某些键不是唯一的元素的pythonic方法是什么?

Let's say one has a list of dicts such as:假设一个人有一个字典列表,例如:

[
    {'a': 1, 'b': 'j'},
    {'a': 2, 'b': 'j'},
    {'a': 3, 'b': 'i'}
]

The expected output would remove the second element, because the key b equals to j in more than one element.预期的 output 将删除第二个元素,因为键b在多个元素中等于j Thus:因此:

[
    {'a': 1, 'b': 'j'},
    {'a': 3, 'b': 'i'}
]

This is what I have tried:这是我尝试过的:

input = [
    {'a': 1, 'b': 'j'},
    {'a': 2, 'b': 'j'},
    {'a': 3, 'b': 'i'}
]

output = []
for input_element in input:
    if not output:
        output.append(input_element)
    else:
        for output_element in output:
            if input_element['b'] != output_element['b']:
                output.append(input_element)

Would the solution be simpler if that'd be a list of tuples, such as:如果那是一个元组列表,解决方案会更简单吗,例如:

[(1, 'j'), (2, 'j'), (3, 'i')]

# to produce
[(1, 'j'), (3, 'i')]

Here is an approach using any() and list-comprehension:这是一种使用any()和列表理解的方法:

Code:代码:

l=[
    {'a': 1, 'b': 'j'},
    {'a': 2, 'b': 'j'},
    {'a': 3, 'b': 'i'}
]

new_l = []

for d in l:
    if any([d['b'] == x['b'] for x in new_l]):
        continue
    new_l.append(d)

print(new_l)

Output: Output:

[{'a': 1, 'b': 'j'}, {'a': 3, 'b': 'i'}]
def drop_dup_key(src, key):
    ''' src is the source list, and key is a function to obtain the key'''
    keyset, result = set(), []
    for elem in src:
        keyval = key(elem)
        if keyval not in keyset:
             result.append(elem)
             keyset.add(keyval)
    return result

Use it like this:像这样使用它:

drop_dup_key(in_list, lambda d: return d.get('b'))

You could define a custom container class which implements the __eq__ and __hash__ magic methods.您可以定义一个自定义容器 class 实现__eq____hash__魔术方法。 That way, you can use a set to remove "duplicates" (according to your criteria).这样,您可以使用一set来删除“重复项”(根据您的标准)。 This doesn't necessarily preserve order.这不一定保持秩序。

from itertools import starmap
from typing import NamedTuple

class MyTuple(NamedTuple):
    a: int
    b: str

    def __eq__(self, other):
        return self.b == other.b

    def __hash__(self):
        return ord(self.b)


print(set(starmap(MyTuple, [(1, 'j'), (2, 'j'), (3, 'i')])))

Output: Output:

{MyTuple(a=3, b='i'), MyTuple(a=1, b='j')}
>>> 

I suggest this implementation:我建议这个实现:

_missing = object()
def dedupe(iterable, selector=_missing):
    "De-duplicate a sequence based on a selector"
    keys = set()
    if selector is _missing: selector = lambda e: e
    for e in iterable:
        if selector(e) in keys: continue
        keys.add(selector(e))
        yield e

Advantages:优点:

  • Returns a generator:返回一个生成器:
    It iterates the original collection just once, lazily.它只是懒惰地迭代原始集合一次。 That could be useful and/or performatic in some scenarios, specially if you will chain additional query operations.这在某些情况下可能有用和/或执行,特别是如果您将链接其他查询操作。

     input = [{'a': 1, 'b': 'j'}, {'a': 2, 'b': 'j'}, {'a': 3, 'b': 'i'}] s = dedupe(input, lambda x: x['b']) s = map(lambda e: e['a'], s) sum(s) # Only now the list is iterated. Result: 4
  • Accepts any kind of iterable:接受任何类型的可迭代:
    Be it a list, set, dictionary or a custom iterable class.无论是列表、集合、字典还是自定义的可迭代 class。 You can construct whatever collection type out of it, without iterating multiple times.您可以从中构造任何集合类型,而无需多次迭代。

     d = {'a': 1, 'b': 1, 'c': 2} {k: v for k, v in dedupe(d.items(), lambda e: e[1])} # Result (dict): {'a': 1, 'c': 2} {*dedupe(d.items(), lambda e: e[1])} # Result (set of tuples): {('a', 1), ('c', 2)}
  • Takes an optional selector function (or any callable):采用可选的选择器 function (或任何可调用的):
    This gives you flexibility to re-use this function in many different contexts, with any custom logic or types.这使您可以灵活地在许多不同的上下文中以任何自定义逻辑或类型重复使用此 function。 If the selector is absent, it compares the whole elements.如果选择器不存在,它会比较整个元素。

     # de-duping based on absolute value: (*dedupe([-3, -2, -2, -1, 0, 1, 1, 2, 3, 3], abs),) # Result: (-3, -2, -1, 0) # de-duping without selector: (*dedupe([-3, -2, -2, -1, 0, 1, 1, 2, 3, 3]),) # Result: (-3, -2, -1, 0, 1, 2, 3)

The comparison of tuples to dictionaries isn't quite accurate since the tuples only contain the dictionary values, not the keys, and I believe you are asking about duplicate key:value pairs.元组与字典的比较不是很准确,因为元组只包含字典值,而不是键,我相信你在问重复的键:值对。

Here is a solution which I believe solves your problem, but might not be as pythonic as possible.这是一个我相信可以解决您的问题的解决方案,但可能不像 pythonic 一样。

seen = set()
kept = []

for d in x:
     keep = True
     for k, v in d.items():
         if (k, v) in seen:
             keep = False
             break
         seen.add((k, v))
     if keep:
         kept.append(d)

print(kept)

Output: Output:

[{'a': 1, 'b': 'j'}, {'a': 3, 'b': 'i'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM