I have two doubts regarding the below implementation of a trie data structure.
Doubt 1:
I am having a hard time understanding the insert function in a trie. This is the insert word function:
def add(self, word):
cur = self.head
for ch in word:
if ch not in cur:
cur[ch] = {}
cur = cur[ch]
# * denotes the Trie has this word as item
# if * doesn't exist, Trie doesn't have this word but as a path to longer word
cur['*'] = True
Why an empty dictionary is initiated after the if statement?
Also, what is the significance of cur = cur[ch]
?
Please help me understand those lines in the if
statement in the code.
Doubt 2:
I am trying to print all the nodes present inside the trie, but it is printing as an object like <__main__.Trie object at 0x7f655de1c9e8>
. Can someone please help me to print the nodes of the trie?
Below is the code.
class Trie:
head = {}
def add(self, word):
cur = self.head
for ch in word:
if ch not in cur:
cur[ch] = {}
cur = cur[ch]
# * denotes the Trie has this word as item
# if * doesn't exist, Trie doesn't have this word but as a path to longer word
cur['*'] = True
def search(self, word):
cur = self.head
for ch in word:
if ch not in cur:
return False
cur = cur[ch]
if '*' in cur:
return True
else:
return False
dictionary = Trie()
dictionary.add("hi")
dictionary.add("hello")
print((dictionary)) # <__main__.Trie object at 0x7f655de1c9e8>
1) The if statement is to check if the given character does not already have its own dictionary at the current depth, then it creates an empty dictionary. The cur = cur[ch]
is to increase the depth of cur by 1, in an attempt to find the place to put word
2) To have display the contents of Trie, add a __ str__ method in Trie.
For example:
def __str__(self):
#code
Initially, you have a plain old empty dictionary, head = {}
. Your first word is "cat"
. You create a reference to head
called cur
. This is necessary since we'll be traversing the structure and would lose our reference to the outermost head if we don't use a temporary variable. Modifications made to cur
will reflect on head
. We need to add a ch = "c"
key to the empty dict as our first letter of "cat"
, but unfortunately, this key doesn't exist. So we create it. head
/ cur
now looks like:
head = {"c": {}}
cur = head
Then, the line cur = cur[ch]
executes. ch
is "c"
, so this is the same as cur = cur["c"]
. cur
has just moved down a level of the trie and we step the for
loop to the next character, which is "a"
. We're back to the same scenario: cur = {}
and we need to add the "a"
key, so we do:
head = {"c": {"a": {}}}
cur = head["c"]
cur = cur["a"]
runs and the same thing repeats for the next iteration:
head = {"c": {"a": {"t": {}}}}
cur = head["c"]["a"]
Finally the loop ends, we set the flag character "*"
we're done adding "cat"
. Our result is:
head = {"c": {"a": {"t": {"*": True}}}}
Now, let's call trie.add("cart")
. I'll just show the updates:
head = {"c": {"a": {"t": {"*": True}}}}
cur = head
head = {"c": {"a": {"t": {"*": True}}}}
cur = head["c"]
head = {"c": {"a": {"t": {"*": True}}}}
cur = head["c"]["a"]
head = {
"c": {
"a": {
"r": {},
"t": {"*": True}
}
}
}
cur = head["c"]["a"]["r"]
head = {
"c": {
"a": {
"r": {
"t": {}
},
"t": {"*": True}
}
}
}
cur = head["c"]["a"]["r"]["t"]
Finally:
head = {
"c": {
"a": {
"r": {
"t": {"*": True}
},
"t": {"*": True}
}
}
}
We've created an n-ary tree-like data structure (since there are multiple "roots", it's not exactly a tree, but by adding a dummy root node with head
's contents as its children it'd be a legitimate tree).
Hopefully this makes sense. Try adding "car"
next and see what happens.
When you print an object, print
tries to call the object's magic __str__
method. If it doesn't exist, it inherits the default __str__
, which simply prints the memory location of the object. This is useful for comparing object references quickly, but if you want to show the object's data, you need to implement it. Probably the easiest way for your purposes is to dump the head
dict to string:
import json
class Trie:
def __str__(self):
return json.dumps(self.head, sort_keys=True, indent=4)
A bit ironically, had you been able to pretty-print the trie, doubt 1 would be easier to resolve by dumping the structure inside the loop.
Give Trie
an initializer so that head
is not a static variable shared by all instances.
Code like
if '*' in cur: return True else: return False
is poor style. Simply return '*' in cur
.
cur['*'] = True
is a brittle design that will lead to bugs for words with "*"
characters in them. Prefer a key like None
that cannot possibly be a single character in a string.import json
class Trie:
end_mark = None
def __init__(self):
self.head = {}
def add(self, word):
cur = self.head
for ch in word:
if not ch in cur:
cur[ch] = {}
cur = cur[ch]
cur[Trie.end_mark] = True
def __contains__(self, word):
cur = self.head
for ch in word:
if ch not in cur:
return False
cur = cur[ch]
return Trie.end_mark in cur
def __str__(self):
return json.dumps(self.head, sort_keys=True, indent=4)
if __name__ == "__main__":
trie = Trie()
trie.add("cat")
trie.add("cart")
print(trie)
print("cat" in trie)
print("car" in trie)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.