简体   繁体   中英

clarify behaviour: collections.defaultdict vs dict.setdefault

dict provides .setdefault() , which will allow you to assign values, of any type, to missing keys on the fly:

>>> d = dict()
>>> d.setdefault('missing_key', [])
[]
>>> d
{'missing_key': []}

Whereas, if you use defaultdict to accomplish the same task, then the default value is generated on demand whenever you try to access or modify a missing key:

>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['missing_key']
[]
>>> d
defaultdict(<class 'list'>, {'missing_key': []})

However, the following piece of code implemented with defaultdict raises a KeyError instead of creating the item with default value, {} :

trie = collections.defaultdict(dict)
for word in words:
    t = trie
    for c in word:
        t = t[c]
    t["*"] = word

Using .setdefault() works ok:

trie = {}
for word in words:
    t = trie
    for c in word:
        t = t.setdefault(c, {})
    t["*"] = word

checking before access, works ok too:

trie = {}
for word in words:
    t = trie
    for c in word:
        if c not in t:
           t[c] = {}
        t = t[c]
    t["*"] = word

What am I missing when using collections.defaultdict() ?

NB I am trying to build a Trie structure out of a list of words. For example:

words = ["oath", "pea", "eat", "rain"]
trie = {'o': {'a': {'t': {'h': {'*': 'oath'}}}}, 'p': {'e': {'a': {'*': 'pea'}}}, 'e': {'a': {'t': {'*': 'eat'}}}, 'r': {'a': {'i': {'n': {'*': 'rain'}}}}}

In your first example, when you do t = t[c], t becomes a regular empty dict (because that's what you tell the defaultdict to generate in the definition of trie ).

Let's run through the loop with your example word "oath" :

1) t = trie, word = "oath"
2) c = "o"
3) t = t[c]
  3.1) evaluation of t[c] # "o" is not in trie, so trie generates an empty dict at key "o" and returns it to you
  3.2) assignment to t -> t is now the empty dict. If you were to run (t is trie["o"]), it would evaluate to True after this line
4) c = "a"
5) t = t[c]
  5.1) Evaluation of t[c] -> "a" is not in the dict t. This is a regular dict, raise KeyError.

Unfortunately, I can't think of a way to use defaultdict here ( but Marius could, see this answer ), because of the arbitrary nesting of a Trie. You'd need to define the trie as a defaultdict that, in case of missing key, generates a default dict that itself generates a default dict in case of missing key , recursively until maximum depth (which, in principle, is unknown).

IMO, the best way to implement this is with the setdefault as you did in your second example.

GPhilo's answer is perfectly fine and, indeed, I also believe that setdefault is the proper way of doing it.

However, if one prefers to use the defaultdict , it can be easily achieved like this:

def ddnestedconst(): 
    """collections.defaultdict nested-ish constructor."""
    return collections.defaultdict(ddnestedconst) 

# ...

trie = collections.defaultdict(ddnestedconst)
# or, being the same:
#trie = ddnestedconst()

for word in words:
    t = trie
    for c in word:
        t = t[c]
    t["*"] = word

May feel a little bit strange at first, but I find it perfectly readable and semantically accurate.

Reached that point, you may also prefer to create a new class altogether, inspired by defaultdict , but including all the semantics and specific behaviour that you expect from it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM