简体   繁体   English

如何使用 PEG.js 创建一个简单的解析器

[英]How to create a simple parser using PEG.js

The syntax I would like to parse is of the following kind:我想解析的语法如下:

# This is a comment

# This is a block. It starts with \begin{} and ends with \end{}
\begin{document}

# Within the document block other kinds of blocks can exist, and within them yet other kinds.
# Comments can exist anywhere in the code.

This is another block within the block. It is a paragraph, no formal \begin{} and \end{} are needed.
The parser infers its type as a ParagraphBlock. The block ends with the newline.

\end{document}

I am learning how to use PEG, and this is what I have developed so far for the current syntax:我正在学习如何使用 PEG,这是我迄今为止为当前语法开发的内容:

Start
  = (Newline / Comment / DocumentBlock)*
  
Comment
  = '#' value: (!Newline .)* Newline? {
    return {
      type: "comment",
      value: value.map(y => y[1]).join('').trim()
    }
  } 
  
Newline
  = [\n\r\t]
  
DocumentBlock
  = "\\begin\{document\}"
  
    (!"\\end\{document\}" DocumentChildren)*
    
    "\\end\{document\}"
    
DocumentChildren
  = NewlineBlock / ParagraphBlock
    
NewlineBlock
  = value: Newline*
  {
    return {
      type: "newline",
      value: value.length
    }
  }
    
ParagraphBlock
  = (!Newline .)* Newline

I am having some issues with infinite loops.我在无限循环方面遇到了一些问题。 The current code produces this error:当前代码产生此错误:

Line 19, column 5: Possible infinite loop when parsing (repetition used with an expression that may not consume any input).

What would be a correct implementation for the simple syntax above?上述简单语法的正确实现是什么?

I think this is due to the NewlineBlock rule using a kleene star on Newline .我认为这是由于NewlineBlock规则在 Newline 上使用了Newline星。

In DocumentBlock you have a repeated DocumentChildren .DocumentBlock你有一个重复的DocumentChildren In NewlineBlock you have a repeated Newline which means that it can always return '' , the null string, which would cause an infinite loop.NewlineBlock中有一个重复的Newline ,这意味着它总是可以返回'' ,即 null 字符串,这将导致无限循环。

Changing the * in NewlineBlock to a + would fix the problem.NewlineBlock中的*更改为+可以解决问题。 That way it no longer has the option of returning the null string.这样它就不再有返回 null 字符串的选项。

NewlineBlock
  = value: Newline+
  {
    return {
      type: "newline",
      value: value.length
    }
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM