简体   繁体   English

识别并存储在字典中多次出现的值 (Python)

[英]Identify & store the values that appear multiple times in a dictionary (Python)

I have a list of dictionaries, where some "term" values are repeated:我有一个字典列表,其中重复了一些“术语”值:

terms_dict = [{'term': 'potato', 'cui': '123AB'}, {'term': 'carrot', 'cui': '222AB'}, {'term': 'potato', 'cui': '456AB'}]

As you can see the term 'potato' value appears more than once.如您所见,术语“potato”的值出现了不止一次。 I would like to store this 'term' for future reference as a variable.我想将此“术语”存储为变量以供将来参考。 Then, remove all of those repeated terms from the terms_dict , leaving only the term 'carrot' dictionary in the list.然后,从terms_dict中删除所有这些重复的术语,只在列表中留下术语 'carrot' 字典。

Desired output:所需的 output:

repeated_terms = ['potato'] ## identified and stored terms that are repeated in terms_dict. 

new_terms_dict = [{'term': 'carrot', 'cui': '222AB'}] ## new dict with the unique term.

Idea:主意:

I can certainly create a new dictionary with unique terms, however, I am stuck with actually identifying the "term" that is repeated and storing it in a list.我当然可以创建一个具有独特术语的新词典,但是,我坚持实际识别重复的“术语”并将其存储在列表中。

Is there a pythonic way of finding/printing/storing the repeated values?是否有查找/打印/存储重复值的pythonic 方式?

You can use collections.Counter for the task:您可以使用collections.Counter来完成任务:

from collections import Counter

terms_dict = [
    {"term": "potato", "cui": "123AB"},
    {"term": "carrot", "cui": "222AB"},
    {"term": "potato", "cui": "456AB"},
]

c = Counter(d["term"] for d in terms_dict)

repeated_terms = [k for k, v in c.items() if v > 1]
new_terms_dict = [d for d in terms_dict if c[d["term"]] == 1]

print(repeated_terms)
print(new_terms_dict)

Prints:印刷:

['potato']
[{'term': 'carrot', 'cui': '222AB'}]

You can use drop_duplicates and duplicated from pandas :您可以使用drop_duplicates并从pandas duplicated

>>> import pandas as pd
>>> df = pd.DataFrame(terms_dict)
>>> df.term[df.term.duplicated()].tolist() # repeats
['potato']
>>> df.drop_duplicates('term', keep=False).to_dict('records') # without repeats
[{'term': 'carrot', 'cui': '222AB'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM