简体   繁体   English

如何建立特里树来解决这个解析算法

[英]How to build trie tree to solve this parse algorithm

I am trying to using trie tree to solve this problem: 我正在尝试使用特里树来解决这个问题:

Symbol string generator consists of two parts, a set of the start symbol and a set of rules of generation.
For example:
Start symbol: ['S'], Rules of generation: ["S → abc", "S → aA", "A → b", "A → c"]
Then, symbolic string abc can be generated because S → abc. 
Symbolic string ab can be generated because S → aA → ab.
Symbolic string abc can be generated because S → aA → ac.
Now, give you a symbolic string generator and a symbolic string, and you need to return True if the symbolic string can be generated, False otherwise

Example
Given generator = ["S -> abcd", "S -> Ad", "A -> ab", "A -> c"], startSymbol = S, symbolString = “abd”, return True.

explanation:
S → Ad → abd

Given generator = ["S → abc", "S → aA", "A → b", "A → c"], startSymbol = S, symbolString = “a”, return False

I find the key point for this problem is building a trie tree. 我发现此问题的关键是建立特里树。 And I was trying to write: 我试图写:

def build_trie(values): #value is like ['abc', 'Ad'...]
    root = {}
    for word in values:
        current = root
        is_end = False
        for c in word:
            if 'A' <= c <= 'Z':
                vals = m[c] #m is a mapping of {'S': ['abc', 'Ad'], ...}
                rs = build_trie(vals)
                for k in rs:
                    if k not in current:
                        current[k] = rs[k]
                    else:
                        # stuck here...
                        pass

                        # temp = collections.defaultdict(dict)
                        # for d in (current[k], rs[k]):
                        #     for k, v in d.items():
                        #         if k in temp and k != '__end__':
                        #             temp[k].update(v)
                        #         else:
                        #             temp[k] = v
                        # # current[k].update(rs[k])
                        # current[k] = temp[k]
                is_end = True
            else:
                current = current.setdefault(c, {})
                is_end = False
        if not is_end:
            current['__end__'] = '__end__'
    return root

but got stuck on the else part... Have not figure out how to write this trie tree. 但是卡在其他部分上...还没有弄清楚如何编写这棵特里树。 Any clue? 有什么线索吗?

There are multiple parser libraries in python you may want to use. 您可能要使用python中的多个解析器库。 I have used LARK parser . 我使用了LARK解析器 They have given a comparison of various python parsers. 他们给出了各种python解析器的比较。

During my college days I have implemented a LALR(1) parser in C. I guess it will be of less use. 在大学期间,我在C语言中实现了LALR(1)解析器。我想它的用处会更少。 I found an useful implementation in python here , if you wanted to write the entire parser again. 我发现了一个Python有用的实现在这里 ,如果你想重新写入整个分析器。 I haven't tested the working of that code. 我还没有测试该代码的工作原理。

For the given grammar, I have written a validator using LARK as below. 对于给定的语法,我使用LARK编写了一个验证器,如下所示。

from lark import Lark
import sys

grammar = """
        start: "abcd"
         | A "d"
        A: "ab"
         | "c"
        """

parser = Lark(grammar)

def check_grammer(word):
    try:
            parser.parse(word)
            return True
    except Exception as exception:
            print exception
            return False



word = sys.argv[1]
print check_grammer(word)

Hope it helps! 希望能帮助到你!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM