简体   繁体   中英

In python, how does the following AutoVivification class work?

In searching for a way of working with nested dictionaries, I found the following code posted by nosklo , which I would like to have explained, please.

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value

Testing:

a = AutoVivification()

a[1][2][3] = 4
a[1][3][3] = 5
a[1][2]['test'] = 6

print a

Output:

{1: {2: {'test': 6, 3: 4}, 3: {3: 5}}}

I'm a pretty newbie programmer. I have learned most of what I know on my own time on the side, with my only formal training being on Turbo Pascal back in high school. I understand and am able to use classes in simple ways, such as using __init__ , class methods, and storing data within instances of the class with foo.man = 'choo' .

I have no idea how the series of square brackets get directed, correctly, through the class (I presume they are calling __getitem__ somehow) and don't understand how they each get handled so concisely without having to call the method three times individually.

I was under the impression that the (dict) in the class declaration would be handled by an __init__ .

I've used try: except: before, though again, in quite simple ways. It looks to me like the try , when it runs, is calling a series of function __getitem__ . I gather that if the current level's dictionary exists, the try will pass and go to the next dictionary. The except , I gather, runs when there's a KeyError but I haven't seen self used like that before. Self 's being treated like a dictionary while I thought self was an instance of class AutoVivification ... is it both? I have never assigned twice in a row like this foo = man = choo but suspect that value is pointing to self[item] while self[item] points to the result of type(self) . But type(self) would return something like this: <class '__main__.AutoVivification'> wouldn't it? I have no idea what the extra round brackets at the end there are for. Because I don't know how the function is being called, I don't understand where value is being returned.

Sorry for all the questions! There is so much in this that I don't understand and I don't know where to look it up short of reading through the documentation for hours in which I'd retain very little. This code looks like it'll serve my purposes but I want to understand it before using it.

In case you want to know what I'm trying to do in my program with nested dictionaries: I'm trying to hold map data on an astronomical scale. While I can't create dictionaries/lists of 10^6 items nested 4 times (that would be 10^24 items!), the space is mostly empty so I can leave the empty values out completely and only assign when there's something there. What was stumping me was an efficient way of handling the dictionaries.

Line by line:

class AutoVivification(dict):

We make a subclass of dict , so AutoVivification is a kind of dict , with some local changes.

def __getitem__(self, item):

The __getitem()__ hook is called whenever someone tries to access an item on the instance through [...] index lookups. So whenever someone does object[somekey] , type(object).__getitem__(object, somekey) is called.

We'll skip the try for a moment, next line is:

 return dict.__getitem__(self, item)

This calls the unbound method __getitem__() , and passes in our own instance to it, together with the key. In other words, we call the original __getitem__ as defined by our parent class dict .

Now, we all know what happens if there is no item key in a dictionary, a KeyError is raised. This is where the try: , except KeyError combo comes in:

    try:
        return dict.__getitem__(self, item)
    except KeyError:
        value = self[item] = type(self)()
        return value

So, if the current instance (which is a sub-type of dict ) doesn't have a given key, it'll catch the KeyError exception the original dict.__getitem__() method throws, and instead we create a new value, store that in self[item] and return that value.

Now, remember that self is a (subclass) of dict , so it's a dictionary. It thus can assign new values (for which it'll use the __setitem__ hook , incidentially), and in this case it creates a new instance of the same type as self . That's another dict subclass.

So what happens in detail when we call a[1][2][3] = 4 ? Python goes through this step by step:

  1. a[1] leads to type(a).__getitem__(a, 1) . The custom __getitem__ method of AutoVivification catches the KeyError , creates a new instance of AutoVivification , stores that under the key 1 and returns it.

  2. a[1] returned an empty AutoVivification instance. The next item access [2] is called on that object, and we repeat what happened in step 1; there is a KeyError , a new instance of AutoVivification is created, stored under the 2 key, and that new instance is returned to the caller.

  3. a[1][2] returned an empty AutoVivification instance. The next item access [3] is called on that object, and we repeat what happened in step 1 (and in step 2). There is a KeyError , a new instance of AutoVivification is created, stored under the 3 key, and that new instance is returned to the caller.

  4. a[1][2][3] returned an empty AutoVivification instance. Now we store a new value in that instance, 4 .

Once you go to your next line of code, a[1][3][3] = 5 , the top-level AutoVivification instance already has a 1 key, and the return dict.__getitem__(self, item) line will return the corresponding value, which happens to be the AutoVivification instance created in step one above.

From there, the [3] item access call will create a new AutoVivification instance again (because the object at a[1] only has a 2 key), and we go through all the same steps again.

See the object.__getitem__ documentation, for a start.

The class AutoVivification(dict) declaration makes AutoVivification a subclass of dict , so it behaves the same was as dict would unless it explicitly overrides some behavior--as this class does when it overrides __getitem__ .

The call to dict.__getitem__(self, item) would typically be written instead as:

super(AutoVivification, self).__getitem__(item)

(At least in Python 2.x; Python 3 has better syntax.) Either way, what this does is try letting the default dict behavior run, but implement a fallback in case that doesn't work.

type(self)() first looks up the class object corresponding to the self instance, and then calls the class object--which in this case is the same as writing AutoVivification() , which should look much more familiar.

Hope that clears it up for you!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM