简体   繁体   中英

Why are values of a two-level dictionary all pointing to the same object in Python 2.7?

I have tried to define a function to create a two-tiered dictionary, so it should produce the format

dict = {tier1:{tier2:value}}.

The code is:

def two_tier_dict_init(tier1,tier2,value):
    dict_name = {}
    for t1 in tier1:
        dict_name[t1] = {}
        for t2 in tier2:
            dict_name[t1][t2] = value
    return dict_name

So the following example...

tier1 = ["foo","bar"]
tier2 = ["x","y"]
value = []
foobar_dict = two_tier_dict_init(tier1,tier2,value)

on the face of it produces what I want:

foobar_dict =  {'foo':{'x': [],'y':[]},
                'bar':{'x': [],'y':[]}}                   }

However, when appending any value like

foobar_dict["foo"]["x"].append("thing")

All values get appended so the result is:

foobar_dict =  {'foo':{'x': ["thing"],'y':["thing"]},
                'bar':{'x': ["thing"],'y':["thing"]}}

At first I assumed that due to the way my definition builds the dictionary that all values are pointing to the same space in memory, but I could not figure out why this should be the case. I then discovered that if I change the value from an empty list to an integer, when I do the following,

foobar_dict["foo"]["x"] +=1

only the desired value is changed.

I must therefore conclude that it is something to do with the list.append method, but I can not figure it out. What is the explanation?

NB I require this function for building large dictionaries of dictionaries where each tier has hundreds of elements. I have also used the same method to build a three-tiered version with the same issue occurring.

You only passed in one list object, and your second-tier dictionary only stored references to that one object.

If you need to store distinct lists, you need to create a new list for each entry. You could use a factory function for that:

def two_tier_dict_init(tier1, tier2, value_factory):
    dict_name = {}
    for t1 in tier1:
        dict_name[t1] = {}
        for t2 in tier2:
            dict_name[t1][t2] = value_factory()
    return dict_name

Then use:

two_tier_dict_init(tier1, tier2, list)

to have it create empty lists. You can use any callable for the value factory here, including a lambda if you want to store an immutable object like a string or an integer:

two_tier_dict_init(tier1, tier2, lambda: "I am shared but immutable")

You could use a dict comprehension to simplify your function:

def two_tier_dict_init(tier1, tier2, value_factory):
    return {t1: {t2: value_factory() for t2 in tier2} for t1 in tier1}

It happens because you are filling all second-tier dicts with the same list that you passed as value, and all entries are pointing to the same list object.

One solution is to copy the list at each attribution:

dict_name[t1][t2] = value[:]

This only works if you are sure that value is always a list.

Another, more generic solution, that works with any object, including nested lists and dictionaries, is deep copying:

dict_name[t1][t2] = copy.deepcopy(value)

If you fill the dicts with an immutable object like a number or string, internally all entries would refer to the same object as well, but the undesirable effect would not happen because numbers and strings are immutable.

All the values refer to the same list object. When you call append() on that list object, all of the dictionary values appear to change at the same time.

To create a copy of the list change

        dict_name[t1][t2] = value

to

        dict_name[t1][t2] = value[:]

or to

        dict_name[t1][t2] = copy.deepcopy(value)

The former will make a shallow (ie one-level) copy, and the latter will do a deep copy.

The reason this appears to work with ints is because they are immutable, and augmented assignments ( += and friends) do a name rebind just like ordinary assignment statements (it just might be back to the same object). When you do this:

foobar_dict["foo"]["x"] +=1

you end up replacing the old int object with a different one. int s have no capability to change value in-place, so the addition builds (or, possibly finds, since CPython interns certain ints) a different int with the new value.

So even if foobar_dict["foo"]["x"] and foobar_dict["foo"]["y"] started out with the same int (and they did), adding to one of them makes them now contain different ints.

You can see this difference if you try it out with simpler variables:

>>> a = b = 1
>>> a is b
True
>>> a += 1
>>> a 
2
>>> b
1

On the other hand, list is mutable, and calling append doesn't do any rebinding. So, as you suspected, if foobar_dict["foo"]["x"] and foobar_dict["foo"]["y"] are the same list (and they are - check this with is ), and you append to it, they are still the same list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM