简体   繁体   English

Antlr 中的词法分析器模式

[英]Lexer Mode in Antlr

all: I'm trying to write an antlr parser to parse some text, which is formatted like:全部:我正在尝试编写一个 antlr 解析器来解析一些文本,其格式如下:

RP   NUCLEOTIDE SEQUENCE [GENOMIC DNA],
RP   PROTEIN SEQUENCE OF 1-22; 2-17;
RP   240-256; 318-339 AND 381-390, AND CHARACTERIZATION.

Basically all lines have a leading 'RP ' to indicate what the line of text is for and the last line should end with a "."基本上所有行都有一个前导'RP '来指示文本行的用途,最后一行应该以"."结尾"." to indicate the ending of this type of lines.表示此类行的结束。 Also the text can really be anything.文本也可以是任何东西。 What I need in the end is the text.我最终需要的是文字。

I wrote an Antlr grammar for this purpose:为此我写了一个 Antlr 语法:

grammar RefLine;

rp_line: RP_HEADER RP_TEXT;

RP_HEADER : 'RP   '            -> pushMode(RP_FREE_TEXT_MODE);

mode RP_FREE_TEXT_MODE;
RP_HEADER_SKIP: '\nRP   '      -> skip;
RP_TEXT: .+;
DOT_NEWLINE: '.\n'             -> popMode;

The idea here is when see the first RP_HEADER, it change to the RP_FREE_TEXT_MODE and thus skip any RP_HEADER in between the lines.这里的想法是当看到第一个 RP_HEADER 时,它会更改为 RP_FREE_TEXT_MODE,从而跳过两行之间的任何 RP_HEADER。 And when seeing the DOT_NEWLINE, go back to main mode.当看到 DOT_NEWLINE 时,回到主模式。

This grammar, however, doesn't compile with Antlr 4.1, producing error:然而,这个语法不能用 Antlr 4.1 编译,产生错误:

[ERROR] Message{errorType=MODE_NOT_IN_LEXER, args=[RP_FREE_TEXT_MODE, org.antlr.v4.tool.Grammar@5c0662], e=null, fileName='RefLine.g4', line=7, charPosition=5}
[WARNING] Message{errorType=IMPLICIT_TOKEN_DEFINITION, args=[RP_TEXT], e=null, fileName='RefLine.g4', line=3, charPosition=19}

I don't quite understand why the error is produced.我不太明白为什么会产生错误。 Can anyone explain the correct way of using lexer mode in Antlr?谁能解释在 Antlr 中使用词法分析器模式的正确方法? Also, is the TOKEN defined in the mode not available for the parser rule?.另外,模式中定义的 TOKEN 是否不适用于解析器规则?。

EDIT :编辑

As @auselen suggested, I put the the lexer grammer in a separated file RefLineLex.g4:正如@auselen 所建议的那样,我将词法分析器语法放在一个单独的文件 RefLineLex.g4 中:

lexer grammar RefLineLex;

RP_HEADER : 'RP   '            -> pushMode(RP_FREE_TEXT_MODE);

mode RP_FREE_TEXT_MODE;
RP_HEADER_SKIP: '\nRP   '      -> skip;
RP_TEXT: .+;
DOT_NEWLINE: '.\n'             -> popMode;

And in another Combined grammars RefLine.g4 I have:在另一个组合语法 RefLine.g4 中,我有:

grammar RefLine;
import RefLineLex;

rp_line: RP_HEADER RP_TEXT ;

Now Antlr compile file but in the RefLineLexer.java it generated:现在 Antlr 编译文件但在 RefLineLexer.java 中它生成:

private void RP_HEADER_action(RuleContext _localctx, int actionIndex) {
        switch (actionIndex) {
        case 0: pushMode(RP_FREE_TEXT_MODE);  break;
        }
    }

the constant: RP_FREE_TEXT_MODE is not defined anywhere in the RefLineLexer.java.常量: RP_FREE_TEXT_MODE未在 RefLineLexer.java 中的任何地方定义。 Am I still missing something?我还缺少什么吗?

Lexer modes are only available in Lexer grammars and not in compound grammars (Lexer + Parser).词法分析器模式仅在词法分析器语法中可用,在复合语法(词法分析器 + 解析器)中不可用。 See Lexer Rules for some poor documentation and check XML Parser implementation at github for an example.有关一些糟糕的文档,请参阅Lexer Rules ,并在 github 上查看XML Parser implementation 以获取示例。

You should have been able to understand this in very informative errorType=MODE_NOT_IN_LEXER message in error prints :)您应该能够在错误打印中非常有用的errorType=MODE_NOT_IN_LEXER消息中理解这一点:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM