简体   繁体   中英

How can I detect "missing" keys from a Python dict?

I have the following Python dict :

d = {'A-x': 1, 'A-y': 2,
     'B-x': 3,
     'C-x': 4, 'C-y': 5,
     'D-x': 6, 'D-y': 7,
     'E-x': 8}

Where the keys here represent a " Level-SubLevel " pattern.
There is no By or Ey key and they can therefor be considered "missing".

I'm trying to detect these "missing" key Levels, so my expected output would be the list :

['B', 'E']

So far I have the following working solution...

import numpy as np
from itertools import product

a = np.array([k.split('-') for k in d])
all_keys = ['-'.join(x) for x in list(product(set(a[:, 0]), set(a[:, 1])))]
missing_keys = [x.split('-')[0] for x in all_keys - d.keys()]

... but I feel there must be a better/cleaner solution - ideally using the standard python library.

I should clarify also, that in this particular case, the "SubLevel" portion of the key can only be 1 of 2 possible values. So only "x" or "y" . Also "...-x" will always exist, it's only possible that "...-y" may be missing.

Any suggestions would be much appreciated.

Only using standard python library functionality I can provide this solution:

# Generate list of list of section/subsection pairs
a = [k.split('-') for k in d.keys()] 

# Generate set of sections
sec = set([x[0] for x in a])  
# {'A', 'D', 'C', 'B', 'E'}

# Generate set of subsections
subsec = set([x[1] for x in a])
# {'y', 'x'}

# Find missing keys by checking all combinations (saving only the section)
missing_keys = [s for s in sec for ss in subsec if [s, ss] not in a]
# ['B', 'E']

Without using numpy , you can get your all_keys list doing something like this:

all_keys = ['-'.join(x) for x in product(
    set(y.split('-')[0] for y in d.keys()),
    set(z.split('-')[1] for z in d.keys())
)]

Everything else remains the same. It's not any "cleaner", but it avoids pulling in all of numpy for a relatively simple task.

You don't need either numpy or itertools :

d = {'A-x': 1, 'A-y': 2,
     'B-x': 3,
     'C-x': 4, 'C-y': 5,
     'D-x': 6, 'D-y': 7,
     'E-x': 8}

first_letters = set(k.split('-')[0] for k in d)
# {'A', 'B', 'C', 'D', 'E'}
second_letters = set(k.split('-')[1] for k in d)
# {'x', 'y'}
all_keys = [f'{first_letter}-{second_letter}' for first_letter in first_letters
        for second_letter in second_letters]
# ['A-y', 'A-x', 'E-y', 'E-x', 'B-y', 'B-x', 'C-y', 'C-x', 'D-y', 'D-x']

missing_keys = set(x.split('-')[0] for x in all_keys - d.keys())
# {'B', 'E'}

Note that missing_keys are not unique (try with 'F-z' ) so I took the liberty to convert it to a set.

Check out below solution:

lst = []
[lst.append(i.split("-")[0]) for i in list(d.keys())]
for i in set(lst):
     countChar = lst.count(i)
     if countChar == 1:
          print(i)

Here is a solution that will tell you missing y-values like in your example. However you should give more clarification for other cases if you're expecting it to behave otherwise.

for (k, _) in d.items():
if k.split('-')[0]+'-y' not in d:
    missing.append(k.split('-')[0])

Hope this helps

I guess that the most efficient (and compact) solution would be to use groupby from itertools :

from itertools import groupby

groups = [[key, len(list(val))] for key,val in groupby(d, lambda x: x[0])]
m = max(item[1] for item in groups)
missing = [item[0] for item in groups if item[1] < m]

Result:

missing --> ['B', 'E']

After clarifying in your question, that only '-y' keys might be missing, you can try this:

d = {'A-x': 1, 'A-y': 2,
     'B-x': 3,
     'C-x': 4, 'C-y': 5,
     'D-x': 6, 'D-y': 7,
     'E-x': 8}

out = [k for k in set(k.split('-')[0] for k in d) if not k+'-y' in d]
print(out)

Prints:

['B', 'E']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM