简体   繁体   English

统计之前出现的单词数

[英]Count the number of words that appear before

I would like to ask how can we count the number of words that occur alphabetically before the given string in the trie?我想问一下我们如何计算在 trie 中给定字符串之前按字母顺序出现的单词数?

Here is my implementation now.这是我现在的实现。

class TrieNode:
    # Trie node class
    def __init__(self):
        self.children = [None] * 26
        # isEndOfWord is True if node represent the end of the word
        self.isEndOfWord = False
        self.word_count = 0
class Trie:
    # Trie data structure class
    def __init__(self):
        self.root = self.getNode()

    def getNode(self):
        # Returns new trie node (initialized to NULLs)
        return TrieNode()

    def _charToIndex(self, ch):
        # private helper function
        # Converts key current character into index
        # use only 'a' through 'z' and lower case
        return ord(ch) - ord('a')

    def insert(self, key):
        # If not present, inserts key into trie
        # If the key is prefix of trie node,
        # just marks leaf node
        pCrawl = self.root
        length = len(key)
        for level in range(length):
            index = self._charToIndex(key[level])
            # if current character is not present
            if not pCrawl.children[index]:
                pCrawl.children[index] = self.getNode()
            pCrawl = pCrawl.children[index]
            # mark last node as leaf
        pCrawl.isEndOfWord = True
        pCrawl.word_count += 1

    def search(self, key):
        # Search key in the trie
        # Returns true if key presents
        # in trie, else false
        pCrawl = self.root
        length = len(key)
        for level in range(length):
            index = self._charToIndex(key[level])
            if not pCrawl.children[index]:
                return False
            pCrawl = pCrawl.children[index]
        return pCrawl is not None and pCrawl.isEndOfWord

    def count_before(self, string):
        cur = self.root
        for b in string:
            index = self._charToIndex(b)
            print(index)
            cur = cur.children[index]
            if cur is None:
                return 0
        return cur.word_count
def total_before(text):
    t = Trie()
    for i in range(len(text)):
        t.insert(text[i])
    
    a_list = [] # A list to store the result that occur before the text[i]
    for i in range(len(text)):
        result = t.count_before(text[i])
        a_list.append(result)
    return a_list

total_before(["bac", "aaa", "baa", "aac"]) # Output will be [3, 0, 2, 1]

I would like to know how can I count the number of words that occur before the given string in the trie that I had created.我想知道如何计算在我创建的 trie 中给定字符串之前出现的单词数。 Can someone give me an idea about it?有人可以给我一个想法吗?

As word_count is currently initialised, it does not serve much purpose.由于word_count当前已初始化,因此没有多大用处。 It only is non-zero at nodes with isEndOfWord set to True.它仅在isEndOfWord设置为 True 的节点处非零。 It would be more useful if it counted the number of words that depend on the current node, ie words that either end in that node (which your code counts now), or continue further down the trie (which are currently not counted).如果它计算依赖于当前节点的单词数,即以该节点结尾的单词(您的代码现在计算在内)或继续沿着 trie 向下继续(当前未计算在内),将会更有用。

To make that happen, also increment word_count while you descend the trie:为了做到这一点,还可以在下降 trie 的同时增加word_count

    def insert(self, key):
        pCrawl = self.root
        length = len(key)
        for level in range(length):
            pCrawl.word_count += 1   # <-------------- added
            index = self._charToIndex(key[level])
            if not pCrawl.children[index]:
                pCrawl.children[index] = self.getNode()
            pCrawl = pCrawl.children[index]
        pCrawl.isEndOfWord = True
        pCrawl.word_count += 1

In count_before you would need to sum up all the word_count values of the child nodes the precede the child that you will select, as those represent words that come before the current word:count_before ,您需要对子节点之前的所有word_count值求和,您将得到 select,因为它们代表当前单词之前的单词:

    def count_before(self, string):
        count = 0  # used to accumulate the word_counts
        cur = self.root
        for b in string:
            index = self._charToIndex(b)
            # add the word counts of the children that are to the left of this index:
            count += sum(node.word_count for node in cur.children[:index] if node)
            cur = cur.children[index]
            if cur is None:
                break
        return count

This line:这一行:

count += sum(node.word_count for node in cur.children[:index] if node)

Is a compact way of doing this:是一种紧凑的方式来做到这一点:

mysum = 0
for node in cur.children[:index]:
    if node:
        mysum += node.word_count
sum += mysum

I think you overcomplicated the problem.我认为你把问题复杂化了。

def total_before(lst):
    return [sorted(lst).index(el) for el in lst]

print(total_before(["bac", "aaa", "baa", "aac"])) 

Output: Output:

[3, 0, 2, 1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM