简体   繁体   中英

How to check in Python if a key:value exists in an array (key:value) exists

I have an array of key:value pairs which I am generating using a loop over the contents (entity extraction) of documents.

entity_array.append({
    "key": entity.label_,
    "value": entity.text
})

I would like to add in a check that if the key of value already exists don't append but unsure how using key AND value. The reason being I am getting a lot of duplicate rows.

I'm able to check if the key OR value exists but this doesn't give the desired result as an entity could belong to multiple keys.

Any help appreciated.

It sounds like the data structure you are using is causing you some issues. If you want to keep track of duplicate combinations of entity.label_ and entity.text values, consider treating the combination as a namedtuple and using a set to quickly check for duplicates:

import collections

Entity = collections.namedtuple("Entity", ["key", "value"])  # a tuple called "Entity" with named elements
entity_set = set()  # empty set where we will store deduplicated combinations of label and text

for entity in your_iterable_here:
    entity_set.add(Entity(key=entity.label_, value=entity.text))  # add to the set if it's not there already, otherwise do nothing

You can even do this as a one-liner if you want:

entity_set = set(Entity(key=entity.label_, value=entity.text) for entity in your_iterable_here)

When you are done, you will have a collection of unique key/value pairs in entity_set . If you absolutely need the entities in the data structure mentioned in the OP (a list of dicts), one option is to take advantage of the namedtuple._asdict() function (which, despite the underscore in the name, is a fully documented function and a part of the "public" namedtuple interface):

entity_array = [entity._asdict() for entity in entity_set]

There are two caveats to this solution:

  1. Whatever entity._label and entity.text are, they must be hashable to be put into a set . There are ways around this if the things you are storing are not simple values like strings, but it can get complicated.
  2. The order of the entities generated by your_iterable_here will not be preserved. There easy ways around this, like using an OrderedDict with Entity keys and bool values instead of a set.

you can implements your own function for that, example you can call get method with given key and compare the returned value with your spected value:

def exists(dict_:dict, key:str, value:object) -> bool:
    return dict_.get(key) == value

You'll have to check two conditions - (a) if the key is not present in the target dictionary, and (b) if the key is present but the value is different. In both cases, you will have to add the new value to the dictionary.

For eg, suppose dict{} is your main dictionary, and values_to_add below is a new dictionary that has some values that need to be added to dict{}. The below code does what you're looking to do:

from itertools import combinations
from datetime import timedelta
import datetime
import pandas as pd
import numpy as np
import random as rd

dict = {
    "Key_1": "Value_1",
    "Key_2": "Value_2",
    "Key_3": "Value_3"
}

values_to_add = {
    "Key_1": "Value_X",
    "Key_4": "Value_4"
}

for key,value in values_to_add.items():
    if key in dict and dict[key] != value:
        dict[key]=value
    if not key in dict:
        dict[key] = value

dict

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM