How do I create a dictionary that indexes the location of certain keys?

Question

I have a class that inherits the dict object.

my_subclassed_dict = SubclassedDictionary({
        "id": {"value1": 144
               "value2": "steve",
               "more" {"id": 114}
        },
        "attributes": "random"
})

On initialization of SubclassedDictionary , I would like paths generated which match a certain condition.

Hypothetically, if I was to make this condition, 'index all numbers above 100' This could perhaps then access my_subclassed_dict.get_paths() , which would then return some kind of structure resembling this:

[
    ['id', 'value1'], 
    ['id', 'more', 'id',]
]

In short, how can I subclass dict which generates paths for keys matching a certain condition, on instantiation?

EDIT

Since someone asked for an example implementation. However the problem with this is that it doesn't handle nested dictionaries.

class SubclassedDictionary(dict):
    paths = []

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)  # use the free update to set keys

    def update(self, *args, **kwargs):
        temp = args[0]
        for key, value in temp.items():
            if isinstance(value, int):
                if value > 100:
                    self.paths.append(key)
        super(SubclassedDictionary, self).update(*args, **kwargs)

dictionary = {
   "value1": 333,
   "v2": 99,
   "v2": 129,
   "v3": 30,
   "nested": {
      "nested_value" 1000
   }
}

new_dict = SubclassedDictionary(dictionary)

print(new_dict.paths) # outputs: ['v2','value1']

If it did work as intended.

print(new_dict.paths)

Would output

[
   ['v2'],
   ['value1'],
   ['nested', 'nested_value']
]

Answer 1

From what I understand, you want a dictionary that is capable of returning the keys of dictionaries within dictionaries if the value the key's are associated with match a certain condition.

class SubclassedDictionary(dict):
    def __init__(self, new_dict, condition=None, *args, **kwargs):
        super(SubclassedDictionary, self).__init__(new_dict, *args, **kwargs)
        self.paths = []
        self.get_paths(condition)

    def _get_paths_recursive(self, condition, iterable, parent_path=[]):
        path = []
        for key, value in iterable.iteritems():
            # If we find an iterable, recursively obtain new paths.
            if isinstance(value, (dict, list, set, tuple)):
                # Make sure to remember where we have been (parent_path + [key])
                recursed_path = self._get_paths_recursive(condition, value, parent_path + [key])
                if recursed_path:
                    self.paths.append(parent_path + recursed_path)
            elif condition(value) is True:
                self.paths.append(parent_path + [key])

    def get_paths(self, condition=None):
        # Condition MUST be a function that returns a bool!
        self.paths = []
        if condition is not None:
            return self._get_paths_recursive(condition, self)

def my_condition(value):
    try:
        return int(value) > 100
    except ValueError:
        return False



my_dict = SubclassedDictionary({"id": {"value1": 144,
                                       "value2": "steve",
                                       "more": {"id": 114}},
                                "attributes": "random"},
                               condition=my_condition)

print my_dict.paths  # Returns [['id', 'value1'], ['id', 'more', 'id']]

There's a few benefits to this implementation. One is that you can change your condition whenever you want. In your question it sounded like this may be a feature that you were interested in. If you want a different condition you can easily write a new function and pass it into the constructor of the class, or simply call get_paths() with your new condition.

When developing a recursive algorithm there are 3 things you should take into account.

1) What is my stopping condition? In this case your literal condition is not actually your stopping condition. The recursion stops when there are no longer elements to iterate through.

2) Create a non-recursive function This is important for two reasons (I'll get to the second later). The first reason is that it is a safe way to encapsulate functionality that you don't want consumers to use. In this case the _get_paths_recursive() has extra parameters that if a consumer got a hold of could ruin your paths attribute.

3) Do as much error handling before recursion (Second reason behind two functions) The other benefit to a second function is that you can do non-recursive operations. More often than not, when you are writing a recursive algorithm you are going to have to do something before you start recursing. In this case I make sure the condition parameter is valid (I could add more checking to make sure its a function that returns a bool, and accepts one parameter). I also reset the paths attribute so you don't end up with a crazy amount of paths if get_paths() is called more than once.

Answer 2

The minimal change is something like:

class SubclassedDictionary(dict):

    def __init__(self, *args, **kwargs):
        self.paths = []  # note instance, not class, attribute
        self.update(*args, **kwargs)  # use the free update to set keys

    def update(self, *args, **kwargs):
        temp = args[0]
        for key, value in temp.items():
            if isinstance(value, int):
                if value > 100:
                    self.paths.append([key])  # note adding a list to the list
            # recursively handle nested dictionaries
            elif isinstance(value, dict):
                for path in SubclassedDictionary(value).paths:
                    self.paths.append([key]+path)
        super(SubclassedDictionary, self).update(*args, **kwargs)

Which gives the output you're looking for:

>>> SubclassedDictionary(dictionary).paths
[['v2'], ['value1'], ['nested', 'nested_value']]

However, a neater method might be to make paths a method, and create nested SubclassedDictionary instances instead of dictionaries, which also allows you to specify the rule when calling rather than hard-coding it. For example:

class SubclassedDictionary(dict):

    def __init__(self, *args, **kwargs):
        self.update(*args, **kwargs)  # use the free update to set keys

    def update(self, *args, **kwargs):
        temp = args[0]
        for key, value in temp.items():
            if isinstance(value, dict):
                temp[key] = SubclassedDictionary(value)
        super(SubclassedDictionary, self).update(*args, **kwargs)

    def paths(self, rule):
        matching_paths = []
        for key, value in self.items():
            if isinstance(value, SubclassedDictionary):
                for path in value.paths(rule):
                    matching_paths.append([key]+path)
            elif rule(value):
                matching_paths.append([key])
        return matching_paths

In use, to get the paths of all integers larger than 100 :

>>> SubclassedDictionary(dictionary).paths(lambda val: isinstance(val, int) and val > 100)
[['v2'], ['value1'], ['nested', 'nested_value']]

One downside is that this recreates the path list every time it's called.

It's worth noting that you don't currently handle the kwargs correctly (so neither does my code!); have a look at eg Overriding dict.update() method in subclass to prevent overwriting dict keys where I've provided an answer that shows how to implement an interface that matches the basic dict 's. Another issue that your current code has is that it doesn't deal with keys subsequently being removed from the dictionary; my first snippet doesn't either, but as the second rebuilds the path list each time it's not a problem there.

How do I create a dictionary that indexes the location of certain keys?

Question

2 answers

solution1
2 ACCPTED 2015-05-27 17:11:58

solution2
0 2015-05-27 17:25:11

How do I create a dictionary that indexes the location of certain keys?

Question

2 answers

solution1 2 ACCPTED 2015-05-27 17:11:58

solution2 0 2015-05-27 17:25:11

solution1
2 ACCPTED 2015-05-27 17:11:58

solution2
0 2015-05-27 17:25:11