简体   繁体   中英

Deserialize tree given inorder format?

I have a string of nodes from a binary tree in the following serialized format:

# <- the value of a node
(a b c) <- node b has left child a and right child c.

If a node has a child, it will always have two children. All nodes are independent and their values are simply the value of node.data, so many nodes could have the same value (but they are still different nodes).

So for example:

(((1 6 3) 5 (8 1 2)) 10 (1 1 1))

Means the root of the tree has value 10, and has two children with values 5 and 1. The child with value 5 has two children, 6 and 1. The 6 has children 1 and 3, and the 1 has children 8 and 2, and so on.

I am trying to parse this into a tree but only know how to do it "inefficiently" by trimming the start/end parentheses, and then scanning the entire string until the number of ( matches the number of ) . So for instance:

(((1 6 3) 5 (8 1 2)) 10 (1 1 1)) 

becomes

((1 6 3) 5 (8 1 2)) 10 (1 1 1)

And so I scan, scan, scan, and have the parentheses counts match after I read ((1 6 3) 5 (8 1 2)) which means I have the left child, which means the next character will be the parent, and everything after that will be the right child. Recurse, recurse, and so on. Except this way I am wasting a lot of time re-scanning the left child at each step.

Is there a better way to do this?

You didn't specify your language of choice here, but this looks very LISP-like. It has some nice methods for working with lists like these, but nevertheless, I'll try to give a general answer.

First, these are the steps of the recursive method, in some Scala-like code:

def getLocalRoot(record : String) : Node
{
    val (leftChildrenString, rootPosition) = extractLeft(record)
    val rightChildrenString = extractRight(record, rootPosition)
    val localRootString = extractLocalRoot(record)
    val localRoot = new Node(localRootString) //
    if(leftChildrenString.contains('(')) //a hack, really
       localRoot.left = getLocalRoot(leftChildrenString) //not a leaf
    else
       localRoot.left = new Node(leftChildrenString)  //it is a leaf

    if(rightChildrenString.contains('('))
       localRoot.right=getLocalRoot(rightChildrenString)
    else
       localRoot.right = new Node(rightChildrenString)
    return localRoot
}

def findTreeRoot(serializedTree : String) : Node
{
    return getLocalRoot(serializedTree)
}

( (1 5 6) 2 (4 3 0) ) I'm calling the bold part "left children", the right "the right children".

Let's explain in words. First you need to split the string into it's left and right side, hence the extractLeft and extractRight . I suggest you do this by parsing the string from left to right, and count the parenthesis. As soon as the counts gets back to 1 after a closed parenthesis, the next item is the root of that sub-tree. Then return the left part. You also return the position of the root of the subtree to pass it to the function returning the right child, just to speed it up. The method returning the right part of the string should really just return everything on the right of the right (minus the closing ) ).

Then, take the current local root, store it, and call the same method on the left half and right half, but only if the left or right half is not a leaf. If it is a leaf, then you can use it to instantiate a new node, and attach it to the now found parent. I used a hack, I just check if the string contains a parenthesis, you can come up with a better solution.

=============== Alternative approach ===========

This requires only one scan, although I had to pad the parenthesis with blanks so I could parse them easier, but nevertheless, the crux is the same. I've basically used a stack. As soon as you get to a closed parenthesis, pop 3 from the top, merge them, and push them back.

trait Node

case class Leaf(value: String) extends Node

case class ComplexNode(left: Node, value: Leaf, right: Node) extends Node

object Main {

  def main(args: Array[String]) = {
    val stack = new mutable.Stack[Node]
    var input = "(((1 6 3) 5 (8 1 2)) 10 (1 1 1))"
    input = input.replace(")", " ) ").replace("(", " ( ") //just to ease up parsing, it's easier to extract the numbers

    input.split(" ").foreach(word =>
      word match {
        case ")" => {
          stack push collapse(stack)
        }
        case c : String =>  {
          if (c != "(" && c != "") 
             stack.push(Leaf(c))
        }
      }
    )
    println(stack.pop) //you have your structure on the top of the stack
  }

  def collapse(stack: mutable.Stack[Node]): Node = {
    val right = stack.pop
    val parent = stack.pop.asInstanceOf[Leaf]
    val left = stack.pop
    return new ComplexNode(left, parent, right)

  }
}

Yes, you can write a simple recursive parsing function. When you parse a tree:

  • if you see '(' , you know that next you'll need to read the left child (ie parse a tree recursively), then the value of the node, and then the right child (parse a tree recursively again).

  • if you see a number, you know that it's a leaf.

This approach takes O(n) time and uses O(treeHeight) additional (ie, except the string and the tree) memory to store the stack of the recursive calls.

Here is a code sample in Python 3:

import re
from collections import namedtuple

Leaf = namedtuple("Leaf", ["value"])

Node = namedtuple("Node", ["value", "left", "right"])

def parse(string):
    # iterator, which returns '(', ')' and numbers from the string
    tokens = re.finditer(r"[()]|\d+", string)

    def next_token():
        return next(tokens).group()

    def tree():
        token = next_token()
        if token == '(':
            left, value, right = tree(), element(), tree()
            next_token() # skipping closing bracket
            return Node(value, left, right)
        else:
            return Leaf(int(token))

    def element():
        return int(next_token())

    return tree()

Testing:

In [2]: parse("(((1 6 3) 5 (8 1 2)) 10 (1 1 1))")
Out[2]: Node(value=10, left=Node(value=5, left=Node(value=6, left=Leaf(value=1), right=Leaf(value=3)), right=Node(value=1, left=Leaf(value=8), right=Leaf(value=2))), right=Node(value=1, left=Leaf(value=1), right=Leaf(value=1)))

The simplest and most concise way to get separate tokens (parentheses and numbers) from the string is with regular expressions. Most regex libraries support iteration over non-overlapping matches without creating the whole array of matches at once.

For example, in Java tokens variable can be declared as Matcher tokens = Pattern.compile("[()]|\\\\d+").matcher(string); and next_token() then will become matcher.find() . And in C++ you can use std::regex_iterator for the same purpose.

But you can also implement tokenizing manually by maintaining an index of the current character. For example:

def parse(source):
    index = 0

    def next_number():
        nonlocal index
        start = index
        while index < len(source) and source[index].isdigit():
            index += 1
        return source[start:index]

    def next_token():
        nonlocal index
        while index < len(source):
            current = source[index]
            if current in [')', '(']:
                index += 1
                return current
            if current.isdigit():
                return next_number() 
            index += 1

    # Functions tree() and element() stay the same.
    # ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM