简体   繁体   中英

Return strings that matches the prefix in a Trie

I managed to construct a Trie and now i want to return strings that matches a prefix from the Trie but I'm having issues writing the search function.

For example, if i have a prefix "aa", i want to have the strings "aa" and "aac" as an output.

class Node:
    def __init__(self):
        self.children = [None] * 26
        self.end = False
        self.value = ""

class Trie:
    def __init__(self):
        self.root = Node()

    def add_word(self, key):
        word_length = len(key)
        current = self.root
        for i in range(word_length):
            position = self.ord_char(key[i])

            if current.children[position] is None:
                current.children[position] = Node()
            current = current.children[position]
            current.value = key[i]
        current.end = True

    def ord_char(self,key):
        ord_rep = ord(key) - ord('a')
        return ord_rep

    def prefix_search(self, prefix):
        lst = []
        current = self.root
        prefix_length = len(prefix)
        for c in range(prefix_length):
            c_position = self.ord_char(prefix[c])
            current = current.children[c_position]
            lst.append(current.value)
        #doesnt seem like I'm doing it right

if __name__ == "__main__":
    trie = Trie()
    trie.add_word("aa")
    trie.add_word("aac")
    trie.add_word("b")
    trie.prefix_search("aa")

I was thinking of just joining up the alphabets together to form the final string through the search function but i just couldn't think of any better way to do so.

The lst value so far is just the prefix, split up into separate letters, but now you need to process each node you find in the children attribute that is not None , to find all nodes that have end set to True . Each time you find such a node, you have a complete word. Any node can have multiple child nodes again, branching out into still more words to output.

You could use a stack to keep track of all the nodes you need to process to build the list, together with their prefix so far. Add child nodes to the stack with the prefix for that node, and process these nodes one by one (adding more child nodes to the stack as you do so).

Note that to start with, you don't need to build a list of prefix characters, you already have that prefix as a variable. To get to your starting point it is simpler to just iterate over the prefix itself:

def prefix_search(self, prefix):
    current = self.root
    # get to the starting point
    for c in prefix:
        current = current.children[self.ord_char(c)]
        if current is None:
            # prefix doesn't exist, abort with an empty list
            return []

    found = []
    stack = [(current, prefix)]
    while stack:
        current, prefix = stack.pop()

        if current.end:
            # this is a complete word, named by prefix
            found.append(prefix)

        # add the children to the stack, each with their letter added to the
        # prefix value.
        for child in current.children:
            if child is None:
                continue
            stack.append((child, prefix + child.value))

    return found

For your given sample trie and prefix, the stack starts with the node at 'aa' . The first while stack: iteration removes that node from the stack and because this node has end set to true, 'aa' is added to found . The node has just one non- None child node, for c , and so the node is added to the stack with 'aac' .

Then the while loop repeats, finds that one element on the stack, sees end is set so 'aac' is added to found , and no more child nodes are located. The stack stays empty and the while loop ends.

Demo:

>>> trie = Trie()
>>> trie.add_word("aa")
>>> trie.add_word("aac")
>>> trie.add_word("b")
>>> trie.prefix_search("aa")
['aa', 'aac']
>>> trie.prefix_search("b")
['b']
>>> trie.add_word('abracadabra')
>>> trie.add_word('abbreviation')
>>> trie.add_word('abbreviated')
>>> trie.add_word('abbrasive')
>>> trie.prefix_search("ab")
['abracadabra', 'abbreviation', 'abbreviated', 'abbrasive']
>>> trie.prefix_search("abr")
['abracadabra']
>>> trie.prefix_search("abb")
['abbreviation', 'abbreviated', 'abbrasive']
>>> trie.prefix_search("abbra")
['abbrasive']

.startswith()怎么样,在我看来,实现您的搜索的简单方法。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM