简体   繁体   English

自定义编程语言的上下文无关语法

[英]Context-Free Grammar for Custom Programming Language

After having completed the Compiler Design course at my university I have been playing around with making a compiler for a simple programming language, but I'm having trouble with the parser. 在我的大学完成了编译器设计课程之后,我一直在为一种简单的编程语言制作编译器,但是我在解析器方面遇到了麻烦。 I'm making the compiler in mosml and using its builtin parser mosmlyac for constructing the parser. 我在mosml中制作编译器,并使用其内置的解析器mosmlyac构造解析器。 Here is an excerpt from my parser showing the grammar and associativity+precedence. 这是我的解析器的摘录,其中显示了语法和关联性+优先级。

...
%right ASSIGN
%left OR
%left AND
%nonassoc NOT
%left EQUAL LESS
%left PLUS MINUS
%left TIMES DIVIDE
%nonassoc NEGATE
...
Prog : FunDecs EOF  { $1 }
;

FunDecs : Fun FunDecs   { $1 :: $2 }
        |               { [] }
;

Fun : Type ID LPAR TypeIds RPAR StmtBlock   { FunDec (#1 $2, $1, $4, $6, #2 $2) }
    | Type ID LPAR RPAR StmtBlock           { FunDec (#1 $2, $1, [], $5, #2 $2) }
;

TypeIds : Type ID COMMA TypeIds     { Param (#1 $2, $1) :: $4 }
        | Type ID                   { [Param (#1 $2, $1)] }
;

Type : VOID                     { Void }
     | INT                      { Int }
     | BOOL                     { Bool }
     | CHAR                     { Char }
     | STRING                   { Array (Char) }
     | Type LBRACKET RBRACKET   { Array ($1) }
;

StmtBlock : LCURLY StmtList RCURLY  { $2 }
;

StmtList : Stmt StmtList    { $1 :: $2 }
         |                  { [] }
;

Stmt : Exp SEMICOLON                    { $1 }
     | IF Exp StmtBlock                 { IfElse ($2, $3, [], $1) }
     | IF Exp StmtBlock ELSE StmtBlock  { IfElse ($2, $3, $5, $1) }
     | WHILE Exp StmtBlock              { While ($2, $3, $1) }
     | RETURN Exp SEMICOLON             { Return ($2, (), $1) }
;

Exps : Exp COMMA Exps   { $1 :: $3 }
     | Exp              { [$1] }
;

Index : LBRACKET Exp RBRACKET Index     { $2 :: $4 }
      |                                 { [] }
;

Exp : INTLIT                    { Constant (IntVal (#1 $1), #2 $1) }
    | TRUE                      { Constant (BoolVal (true), $1) }
    | FALSE                     { Constant (BoolVal (false), $1) }
    | CHRLIT                    { Constant (CharVal (#1 $1), #2 $1) }
    | STRLIT                    { StringLit (#1 $1, #2 $1) }
    | LCURLY Exps RCURLY        { ArrayLit ($2, (), $1) }
    | ARRAY LPAR Exp RPAR       { ArrayConst ($3, (), $1) }
    | Exp PLUS Exp              { Plus ($1, $3, $2) }
    | Exp MINUS Exp             { Minus ($1, $3, $2) }
    | Exp TIMES Exp             { Times ($1, $3, $2) }
    | Exp DIVIDE Exp            { Divide ($1, $3, $2) }
    | NEGATE Exp                { Negate ($2, $1) }
    | Exp AND Exp               { And ($1, $3, $2) }
    | Exp OR Exp                { Or ($1, $3, $2) }
    | NOT Exp                   { Not ($2, $1) }
    | Exp EQUAL Exp             { Equal ($1, $3, $2) }
    | Exp LESS Exp              { Less ($1, $3, $2) }
    | ID                        { Var ($1) }
    | ID ASSIGN Exp             { Assign (#1 $1, $3, (), #2 $1) }
    | ID LPAR Exps RPAR         { Apply (#1 $1, $3, #2 $1) }
    | ID LPAR RPAR              { Apply (#1 $1, [], #2 $1) }
    | ID Index                  { Index (#1 $1, $2, (), #2 $1) }
    | ID Index ASSIGN Exp       { AssignIndex (#1 $1, $2, $4, (), #2 $1) }
    | PRINT LPAR Exp RPAR       { Print ($3, (), $1) }
    | READ LPAR Type RPAR       { Read ($3, $1) }
    | LPAR Exp RPAR             { $2 }
;

Prog is the %start symbol and I have left out the %token and %type declaration on purpose. Prog是%start符号,我故意省略了%token%type声明。

The problem I have is that this grammar seems to be ambiguous and looking at the output of running mosmlyac -v on the grammar it seems that it is the rules containing the token ID that is the problem and creates shift/reduce and reduce/reduce conflicts. 我的问题是该语法似乎模棱两可,并且查看该语法上运行mosmlyac -v的输出,似乎是包含令牌ID的规则才是问题所在,并产生了shift / reduce和reduce / reduce冲突。 The output also tells me that the rule Exp : ID is never reduced. 输出还告诉我,规则Exp:ID永远不会减少。

Can anyone help me make this grammar unambiguous? 谁能帮助我使这个语法明确?

Index has an empty production. Index有一个空的生产。

Now consider: 现在考虑:

Exp : ID
    | ID Index

Which of those applies? 哪些适用? Since Index is allowed to be empty, there is no context in which only one of those is applicable. 由于允许Index为空,因此没有上下文仅适用其中之一。 The parser generator you are using evidently prefers to reduce an empty INDEX , making Exp : ID unusable and creating a large number of conflicts. 您正在使用的解析器生成器显然倾向于减少空的INDEX ,从而使Exp : ID不可用,并产生大量冲突。

I'd suggesting changing Index to: 我建议将Index更改为:

Index : LBRACKET Exp RBRACKET Index     { $2 :: $4 }
      | LBRACKET Exp RBRACKET           { [ $2 ] }

although in the long run, you might be better off with a more traditional "lvalue/rvalue" grammar, in which lvalue includes ID and lvalue [ Exp ] and rvalue includes lvalue . 尽管从长远来看,使用更传统的“左值/右值”语法可能会更好,其中lvalue包括IDlvalue [ Exp ]rvalue包括lvalue (That will give a more elaborate parse tree for ID [ Exp ] [ Exp ] , but there is an obvious homormorphism.) (这将为ID [ Exp ] [ Exp ]提供更复杂的解析树,但是存在明显的同态。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM