简体   繁体   中英

I need a dict-like structure with two keys, where you can get the list of all objects with a certain value of one of them

Let's say I have a dict that looks like this:

d['a']['1'] = 'foo'
d['a']['2'] = 'bar'
d['b']['1'] = 'baz'
d['b']['2'] = 'boo'

If I want to get every item where the first key is 'a', I can just do d['a'] and I will get all of them. However, what if I want to get all items where the second key is '1'? The only way I can think of is to make a second dictionary with a reverse order of the keys, which requires duplicating the contents. Is there a way to do this within a single structure?

Edit: forgot to mention: I want to do this without iterating over everything. I'm going to be dealing with dicts with hundreds of thousands of keys, so I need something scalable.

You're dealing with three dictionaries in this example: One with the values "foo" and "bar", one with the values "baz" and "boo", and an outer dictionary that maps the keys "a" and "b" to those first two inner dictionaries. You can iterate over the keys of both the outer and inner dictionaries with a nested for loop:

items = []
for outer_key in d:
    for inner_key in d[outer_key]:
        if inner_key == "1":
            items.append(d[outer_key][inner_key])
            break  # No need to keep checking keys once you've found a match

If you don't care about the keys of the outer dictionary, you can also use d.values() to ignore the keys and just see the inner dictionaries, then do a direct membership check on those:

items = []
for inner_dict in d.values():
    if "1" in inner_dict:
        items.append(inner_dict["1"])

This can also be written as a list comprehension:

items = [inner_dict["1"] for inner_dict in d.values() if "1" in inner_dict]

What you want sounds very similar to a tree-structure which can be implemented as a dictionary-of-dictionaries. Here's a simple implement taken from one of the answers to the question What is the best way to implement nested dictionaries? :

class Tree(dict):
    def __missing__(self, key):
        value = self[key] = type(self)()
        return value

    def get_second_level(self, second_key):
        found = []
        for level2 in self.values():
            if second_key in level2:
                found.append(level2[second_key])
        return found

d = Tree()
d['a']['1'] = 'foo'
d['a']['2'] = 'bar'
d['b']['1'] = 'baz'
d['b']['2'] = 'boo'
d['c']['2'] = 'mox'
d['c']['3'] = 'nix'

print(d)            # -> {'a': {'1': 'foo', '2': 'bar'}, 'b': {'1': 'baz', '2': 'boo'},
                    #     'c': {'2': 'mox', '3': 'nix'}}
print(d['a'])       # -> {'1': 'foo', '2': 'bar'}
print(d['a']['1'])  # -> foo
print(d['a']['2'])  # -> bar

print()
second_key = '1'
found = d.get_second_level(second_key)
print(f'Those with a second key of {second_key!r}')  # -> Those with a second key of '1'
print(f'  {found}')                                  # ->   ['foo', 'baz']

So after sleeping on it the solution I came up with was to make three dicts, the main one where the data is actually stored and identified by a tuple ( d['a', '1'] = 'foo' ) and the other two are indexes that store all possible values of key B under key A where (A,B) is a valid combination (so a['a'] = ['1', '2'] , b['1'] = ['a', 'b'] . I don't entirely like this, since it still requires a hefty storage overhead and doesn't scale efficiently to higher numbers of keys, but it gets the job done without iterating and without duplicating the data. If anyone has a better idea, I'll be happy to hear it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM