简体   繁体   中英

Counting word strokes while parsing Trie tree

I'm trying to solve the keyboard autocompletion problem described here. The problem is to calculate how many keystrokes a word requires, given some dictionary and autocomplete rules. For example, for the dictionary:

data = ['hello', 'hell', 'heaven', 'goodbye']

We get the following results (please refer to the link above for further explanations):

{'hell': 2, 'heaven': 2, 'hello': 3, 'goodbye': 1}

Quick explanation: if the user types h , then e is autocompleted because all words starting with h also have e as second letter. Now if the user types in l , the other l is filled, giving 2 strokes for the word hell . Of course, hello would require one more stroke. Please, see the link above for more examples.

My Trie code is the following, and it works fine (taken from https://en.wikipedia.org/wiki/Trie ). The Stack code is to parse the tree from root (see edit below):

class Stack(object):
    def __init__(self, size):
        self.data = [None]*size
        self.i = 0
        self.size = size

    def pop(self):
        if self.i == 0:
            return None
        item = self.data[self.i - 1]
        self.i-= 1
        return item

    def push(self, item):
        if self.i >= self.size:
            return None
        self.data[self.i] = item
        self.i+= 1
        return item

    def __str__(self):
        s = '# Stack contents #\n'
        if self.i == 0:
            return
        for idx in range(self.i - 1, -1, -1):
            s+= str(self.data[idx]) + '\n'
        return s

class Trie(object):
    def __init__(self, value, children):
        self.value = value #char
        self.children = children #{key, trie}

class PrefixTree(object):
    def __init__(self, data):
        self.root = Trie(None, {})
        self.data = data

        for w in data:
            self.insert(w, w)

    def insert(self, string, value):
        node = self.root
        i = 0
        n = len(string)

        while i < n:
            if string[i] in node.children:
                node = node.children[string[i]]
                i = i + 1
            else:
                break

        while i < n:
            node.children[string[i]] = Trie(string[:i], {})
            node = node.children[string[i]]
            i = i + 1

        node.value = value


    def find(self, key):
        node = self.root
        for char in key:
            if char in node.children:
                node = node.children[char]
            else:
                return None
        return node

I couldn't figure it out how to count the number of strokes:

data = ['hello', 'hell', 'heaven', 'goodbye']
tree = PrefixTree(data)
strokes = {w:1 for w in tree.data} #at least 1 stroke is necessary

And here's the code to parse the tree from the root:

stack = Stack(100)
stack.push((None, pf.root))
print 'Key\tChilds\tValue'
print '--'*25

strokes = {}

while stack.i > 0:
    key, curr = stack.pop()

    # if something:
         #update strokes

    print '%s\t%s\t%s' % (key, len(curr.children), curr.value)
    for key, node in curr.children.items():
        stack.push((key, node))

print strokes

Any idea or constructive comment would help, thanks!

Edit

Great answer by @SergiyKolesnikov. There's one small change that can be done in order to avoid the call to endsWith() . I just added a boolean field to the Trie class:

class Trie(object):
    def __init__(self, value, children, eow):
        self.value = value #char
        self.children = children #{key, trie}
        self.eow = eow # end of word

And at the end of insert():

def insert(self, string, value):
#...
    node.value = value
    node.eow = True

Then just replace curr.value.endswith('$'): with curr.eow . Thank you all!

The trie for your example can look like this

 ' '
|    \
H     G
|     |
E     O
| \   |
L  A  O
|  |  |
L$ V  D
|  |  |
O  E  B
   |  |
   N  Y
      |
      E

What nodes in the trie can be seen as markers for user key strokes? There are two types of such nodes:

  1. Inner nodes with more than one child, because the user has to choose among multiple alternatives.
  2. Nodes that represent the last letter of a word, but are not leaves (marked with $ ), because the user has to type the next letter if the current word is not what is needed.

While traversing the trie recursively one counts how many of these marker nodes were encountered before the last letter of a word was reached. This count is the number of strokes needed for the word.

For the word "hell" it is two marker nodes: ' ' and E (2 strokes).
For the word "hello" it is three marker nodes: ' ' , E , L$ (3 strokes).
And so on...

What needs to be changed in your implementation:

The end of a valid word needs to be marked in the tree, so that the second condition can be checked. Therefore, we change the last line of the PrefixTree.insert() method from

node.value = value

to

node.value = value + '$'

Now we add a stroke counter for each stack item (the last value in the triple pushed to the stack) and the checks that increase the counter:

stack = Stack(100)
stack.push((None, tree.root, 0)) # We start with stroke counter = 0
print('Key\tChilds\tValue')
print('--'*25)

strokes = {}

while stack.i > 0:
    key, curr, stroke_counter = stack.pop()

    if curr.value is not None and curr.value.endswith('$'):
        # The end of a valid word is reached. Save the word and the corresponding stroke counter.
        strokes[curr.value[:-1]] = stroke_counter

    if len(curr.children) > 1:
        # Condition 2 is true. Increase the stroke counter.
        stroke_counter += 1
    if curr.value is not None and curr.value.endswith('$') and len(curr.children) > 0:
        # Condition 1 is true. Increase the stroke counter.
        stroke_counter += 1

    print('%s\t%s\t%s' % (key, len(curr.children), curr.value))
    for key, node in curr.children.items():
        stack.push((key, node, stroke_counter)) # Save the stroke counter

print(strokes)

Output:

Key Childs  Value
--------------------------------------------------
None    2   None
h   1   
e   2   h
a   1   he
v   1   hea
e   1   heav
n   0   heaven$
l   1   he
l   1   hell$
o   0   hello$
g   1   
o   1   g
o   1   go
d   1   goo
b   1   good
y   1   goodb
e   0   goodbye$
{'heaven': 2, 'goodbye': 1, 'hell': 2, 'hello': 3}

While you go through your stack, you should keep a stroke counter for each node:

  • It begins at 0 for None.
  • If the current node has more than 2 children, the counter of the children will be 1 more than the current counter.
  • If the current value is a valid word and has at least one child, the counter of the child(ren) will be 1 more than the current counter.

For documentation purpose, here's my Ruby answer :

class Node
  attr_reader :key, :children
  attr_writer :final
  def initialize(key, children = [])
    @key = key
    @children = children
    @final = false
  end

  def final?
    @final
  end
end

class Trie
  attr_reader :root
  def initialize
    @root = Node.new('')
  end

  def add(word)
    node = root
    word.each_char.each{|c|
      next_node = node.children.find{|child| child.key == c}
      if next_node then
        node = next_node
      else
        next_node = Node.new(c)
        node.children.push(next_node)
        node = next_node
      end
    }
    node.final = true
  end

  def count_strokes(node=root,word="",i=0)
    word=word+node.key
    strokes = {}
    if node.final? then
      strokes[word]=i
      if node.children.size>0 then
        i+=1
      end
    elsif node.children.size>1 then
      i+=1
    end
    node.children.each{|c|
      strokes.merge!(count_strokes(c, word, i))
    }
    strokes
  end
end

data = ['hello', 'hell', 'heaven', 'goodbye']
trie =  Trie.new
data.each do |word|
  trie.add(word)
end

# File.readlines('/usr/share/dict/british-english').each{|line|
#   trie.add line.strip
# }

puts trie.count_strokes
#=> {"hell"=>2, "hello"=>3, "heaven"=>2, "goodbye"=>1}

60 lines only, and it take less than 3 seconds for 100 000 words.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM