简体   繁体   English

Packrat 缓存:从右到左还是从左到右?

[英]Packrat caching : Right to left vs. Left to right?

I'm currently trying to familiarize myself with packrat parsing.我目前正在尝试熟悉 Packrat 解析。 So I've read the PDF paper from 2002 linked here and in section 2.3 it describes packrat caching as a preliminary process (which occurs before the actual parsing) in which a full caching table is pre-constructed by reading the input from right to left.所以我阅读了 2002 年的 PDF 论文,链接在这里,在第 2.3 节中,它将 Packrat 缓存描述为一个初步过程(发生在实际解析之前),其中通过从右到左读取输入来预先构建完整的缓存表. Only then, the actual linear parsing from left to right can start.只有这样,从左到右的实际线性解析才能开始。

But in every PEG parser implementation I found, the "cache" option is usually a caching process that occurs during the actual left to right parsing.但是在我发现的每个 PEG 解析器实现中,“缓存”选项通常是在实际从左到右解析期间发生的缓存过程。 For example here .例如这里

Is there any difference between both approaches?两种方法有什么区别吗? Thank you.谢谢你。

I recently worked on similar research, met the exact same confusion, and resolved it.我最近进行了类似的研究,遇到了完全相同的困惑,并解决了它。 Regardless if you are still working on this topic, here's my answer.无论您是否仍在研究这个主题,这就是我的答案。

Your understanding is correct:你的理解是正确的:

  • Packrat parser scans input string from left to right Packrat解析器扫描输入字符串从左至右
  • Packrat parser construct the cache from right to left Packrat 解析器从右到左构造缓存

But there's just one approach, not two.但是只有一种方法,而不是两种。 Let's use one simple example Parsing Expression Grammar (PEG) without left-recursion: E -> Num + E | Num让我们使用一个没有左递归的简单示例解析表达式语法 (PEG)E -> Num + E | Num E -> Num + E | Num

(Note that, a left-recursion example requires another long explanation, you can refer CPython's implementation for details) (注意,一个左递归的例子需要另外一个长解释,你可以参考CPython的实现来了解详细信息)

The Syntax Directed Translation (SDT) will be something like:语法定向翻译 (SDT) 将类似于:

E -> a=Num + b=E { a + b }
E -> Num { Num }

And we can write a parse_E function in below:我们可以在下面写一个parse_E函数:

def parse_E(idx):
    if idx in cache['parse_E']:
        return cache['parse_E'][idx]

    lval, nidx = parse_Char(idx)
    if nidx < len(self.tokens):
        operator, nnidx = parse_Char(nidx)
        if operator == '+':
            # E -> Num + E
            rval, nnnidx = parse_E(nnidx)
            cache['parse_E'][idx] = lval + rval, nnnidx
            return cache['parse_E'][idx]
    
    # E -> Num
    cache['parse_E'][idx] = lval, nidx
    return cache['parse_E'][idx]

According to Byran Ford's paper , the parser needs to scan the input string from left to right and construct the cache in any position:根据Byran Ford的论文,解析器需要从左到右扫描输入字符串,在任意位置构造缓存:

for idx in len(input_string):
    parse_E(idx)
    parse_Char(idx)

So, let's check the cache construction under the hood, initially, we have an empty cache and input string:因此,让我们检查引擎盖下的缓存构造,最初,我们有一个空缓存和输入字符串:

cache: {'parse_E': {}, 'parse_Char': {}}
input string: `2 + 3 + 4`

The function call happens in the following order when idx=0 .idx=0时,函数调用按以下顺序发生。 Clearly, we construct the cache from right to left at position 0 (not even to mention idx=1 or above).显然,我们在位置 0从右到左构建缓存(更不用说idx=1或更高)。

  • parse_Char(Y) happens earlier than parse_Char(X) (Y > X) parse_Char(Y)早于parse_Char(X) (Y > X)
  • parse_Char(X) must happens earlier than parse_E(X) parse_Char(X)必须早于parse_E(X)
   parse_E(0)     ---   (E -> Num + E) (pending)
-> parse_Char(0)  --- 2 (pending)
-> parse_Char(1)  --- + (pending)
-> parse_E(2)     --- E (E -> Num + E) (pending)
-> parse_Char(2)  --- 3 (pending)       
-> parse_Char(3)  --- + (pending)
-> parse_E(4)     --- E (E -> Num) (pending)
-> parse_Char(4)  --- 4 (acc)

# Only after parse_Char(4) succeed and fill into cache, parse_E(4) can be successful...and so on.

If you want to read the full Python example of Packrat parser implementation, you can check my repository .如果您想阅读 Packrat 解析器实现的完整 Python 示例,可以查看我的存储库 It contains a handmade Packrat parser and a CPython PEG generated Packrat parser based on a simple meta grammar .它包含一个手工制作的 Packrat 解析器和一个基于简单元语法CPython PEG 生成的 Packrat 解析器

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM