简体   繁体   English

用EBNF表达式解析代码块

[英]Parsing a code block with EBNF expression

I am using CocoR to generate a java-like scanner/parser: 我正在使用CocoR生成类似Java的扫描器/解析器:
I'm having some troubles in creating a EBNF expression to match a codeblock: 我在创建与代码块匹配的EBNF表达式时遇到了一些麻烦:

I'm assuming a code block is surrounded by two well-known tokens: <& and &> example: 我假设一个代码块被两个众所周知的标记包围:<&和&>示例:

public method(int a, int b) <&  
various code  
&>  

If I define a nonterminal symbol 如果我定义一个非终结符

codeblock = "<&" {ANY} "&>"  

If the code inside the two symbols contains a '<' character the generated compiler will not handle it thus giving a syntax error. 如果两个符号中的代码包含“ <”字符,则生成的编译器将不处理该字符,从而产生语法错误。

Any hint? 有什么提示吗?

Edit: 编辑:

COMPILER JavaLike
CHARACTERS

nonZeroDigit  = "123456789".
digit         = '0' + nonZeroDigit .
letter        = 'A' .. 'Z' + 'a' .. 'z' + '_' + '$'.

TOKENS
ident = letter { letter | digit }.

PRODUCTIONS
JavaLike = {ClassDeclaration}.
ClassDeclaration ="class" ident ["extends" ident] "{" {VarDeclaration} {MethodDeclaration }"}" .
MethodDeclaration ="public" Type ident "("ParamList")" CodeBlock.
Codeblock = "<&" {ANY} "&>".

I have omitted some productions for the sake of simplicity. 为了简单起见,我省略了一些作品。
This is my actual implementation of the grammar. 这是我对语法的实际实现。 The main bug is that it fails if the code in the block contains one of the symbols '>' or '&'. 主要的错误是,如果块中的代码包含符号“>”或“&”之一,它将失败。

Nick, late to the party here ... 尼克,晚到这里来...

A number of ways to do this: 有很多方法可以做到这一点:

Define tokens for <& and &> so the lexer knows about them. <&&>定义标记,以便词法分析器了解它们。

You may be able to use a COMMENTS directive 您可能可以使用COMMENTS指令

COMMENTS FROM <& TO &> - quoted as CoCo expects. 来自<& TO &>评论-引用CoCo的预期。

Or make hack NextToken() in your scanner.frame file. 或在您的scanner.frame文件中使用hack NextToken()。 Do something like this (pseudo-code): 做这样的事情(伪代码):

if (Peek() == CODE_START)
{
     while (NextToken() != CODE_END)
     {
        // eat tokens
     }
}

Or can override the Read() method in the Buffer and eat at the lowest level. 或者可以重写Buffer中的Read()方法并以最低级别进食。

HTH HTH

You can expand the ANY term to include <& , &> , and another nonterminal (call it ANY_WITHIN_BLOCK say). 您可以将ANY术语扩展为包括<&&>和另一个非终结符(称其为ANY_WITHIN_BLOCK)。

Then you just use 那你就用

 ANY = "<&" | {ANY_WITHIN_BLOCK} | "&>" codeblock = "<&" {ANY_WITHIN_BLOCK} "&>" 

And then the meaning of {ANY} is unchanged if you really need it later. 如果以后确实需要{ANY},则它的含义不变。

Okay, I didn't know anything about CocoR and gave you a useless answer, so let's try again. 好的,我对CocoR一无所知,给了你一个无用的答案,所以让我们再试一次。

As I started to say later in the comments, I feel that the real issue is that your grammar might be too loose and not sufficiently well specified. 正如我稍后在评论中开始说的那样,我感到真正的问题是您的语法可能过于宽松且不够明确。

When I wrote the CFG for the one language I've tried to create, I ended up using a sort of "meet-in-the-middle" approach: I wrote the top-level structure AND the immediate low-level combinations of tokens first, and then worked to make them meet in the mid-level (at about the level of conditionals and control flow, I guess). 当我为尝试创建的一种语言编写CFG时,最终使用了一种“中间会议”方法:我编写了顶层结构和令牌的直接下层组合首先,然后努力使他们在中间层相遇(我猜大概在条件和控制流的水平上)。

You said this language is a bit like Java, so let me just show you the first lines I would write as a first draft to describe its grammar (in pseudocode, sorry. Actually it's like yacc/bison. And here, I'm using your brackets instead of Java's): 您说这种语言有点像Java,所以让我向您展示我将作为描述其语法的初稿写的第一行(用伪代码,抱歉。实际上,它类似于yacc / bison。在这里,我正在使用您的括号,而不是Java的括号):

 /* High-level stuff */ program: classes classes: main-class inner-classes inner-classes: inner-classes inner-class | /* empty */ main-class: class-modifier "class" identifier class-block inner-class: "class" identifier class-block class-block: "<&" class-decls "&>" class-decls: field-decl | method method: method-signature method-block method-block: "<&" statements "&>" statements: statements statement | /* empty */ class-modifier: "public" | "private" identifier: /* well, you know */ 

And at the same time as you do all that, figure out your immediate token combinations, like for example defining "number" as a float or an int and then creating rules for adding/subtracting/etc. 并且在执行所有操作的同时,找出您的直接令牌组合,例如将“数字”定义为浮点数或整数,然后创建用于加/减/等的规则。 them. 他们。

I don't know what your approach is so far, but you definitely want to make sure you carefully specify everything and use new rules when you want a specific structure. 我不知道到目前为止您的方法是什么,但是您绝对要确保自己仔细指定所有内容,并在需要特定结构时使用新规则。 Don't get ridiculous with creating one-to-one rules, but never be afraid to create a new rule if it helps you organize your thoughts better. 创建一对一规则不要太可笑,但是如果可以帮助您更好地组织思想,就不要害怕创建新规则。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM