简体   繁体   English

从Python的特定Trie实现中删除单词

[英]Deleting a word from a specific implementaion of trie in Python

I am kinda new to datastructures and I am implementing a trie to disambiguate a database of names using edit distance. 我对数据结构有点陌生,并且正在实现一种尝试,以使用编辑距离消除名称数据库的歧义。 I am using the following implementation of the trie: 我正在使用以下的trie实现:

http://stevehanov.ca/blog/index.php?id=114 http://stevehanov.ca/blog/index.php?id=114

which is basically: 这基本上是:

class TrieNode:

    def __init__(self):
       self.word = None
       self.children = {}

       global NodeCount
       NodeCount += 1

    def insert( self, word ):
       node = self
       for letter in word:
            if letter not in node.children: 
                node.children[letter] = TrieNode()

            node = node.children[letter]

       node.word = word

# read dictionary file into a trie
trie = TrieNode()
for name in names:
    WordCount += 1
    trie.insert( name )

This does the job beautifully as it inserts all the names into a trie. 这样可以很好地完成工作,因为它将所有名称插入了特里。 Now, I go through the list of names I have one by one, and use the trie to return a list of all names that are at a certain edit distance from the passed name. 现在,我逐个浏览名称列表,并使用Trie返回与所传递名称相距特定编辑距离的所有名称的列表。 I want to then delete all the names from the trie that were returned in the list. 然后,我想从列表中返回的trie中删除所有名称。

Is there a fast way to do that? 有没有一种快速的方法来做到这一点?

Thanks! 谢谢!

There are two ways to do this, depending on whether you want to check whether you're removing the last path through any internal node (which makes removes slightly slower, but potentially makes searches after the removes slightly faster). 有两种方法可以执行此操作,具体取决于您是否要检查是否要删除通过任何内部节点的最后一条路径(这会使删除速度稍慢,但在删除之后可能会使搜索速度稍快)。 Both ways are trivial to do recursively, but if you want to unroll it iteratively (as your insert does), not checking is easier, so I'll do that. 两种方法都是递归地进行,但是如果要迭代地展开(如insert那样),则不容易检查,因此我将这样做。

def delete(self, word):
    node = self
    for letter in word[:-1]:
        if letter not in node.children:
            return False
        node = node.children[letter]
    if word[-1] in node.children:
        del node.children[letter]
        return True
    return False

Can you make this faster? 你能使它更快吗? Yes, but it may not matter. 是的,但这可能无关紧要。

First, you know that the nodes will always exist, so you can remove some of the error checking. 首先,您知道节点将始终存在,因此可以删除一些错误检查。 More importantly, if you can make the search function return the nodes, instead of just their values, that will make things a little faster. 更重要的是,如果您可以使搜索功能返回节点,而不只是返回节点的值,那将使处理速度更快。 If you can add backlinks up the trie, that means you can erase the node in constant time instead of repeating the search. 如果可以在Trie上添加反向链接,则意味着您可以在恒定时间内删除节点,而无需重复搜索。 If you don't want backlinks up the trie, you can get the exact same benefit by returning a zipper instead of a node—or, more simply, just returning a stack of nodes. 如果您不希望反向链接Trie,则可以通过返回拉链而不是节点(或更简单地说,仅返回节点堆栈)来获得完全相同的收益。

But really, the worst case here is just doubling the work, not increasing the algorithmic complexity or multiplying by a large factor, so simple probably wins. 但实际上,最坏的情况是工作加倍,而不是增加算法复杂性或乘以很大的倍数,因此简单可能会获胜。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM