令牌与Antlr4的匹配

Question

I am a an Antlr4 newbie and have problems with a relatively simple grammar. 我是Antlr4新手，语法相对较简单。 The grammar is given at the bottom at the end. 语法在末尾给出。 (This is a fragment from a grammar for parsing description of biological sequence variants). （这是语法的片段，用于解析生物学序列变体的描述）。

I am trying to parse the string "p.A3L" in the following unit test. 我正在尝试在以下单元测试中解析字符串"p.A3L" 。

@Test
public void testProteinSubtitutionWithoutRef() {
    ANTLRInputStream inputStream = new ANTLRInputStream("p.A3L");
    HGVSLexer l = new HGVSLexer(inputStream);
    HGVSParser p = new HGVSParser(new CommonTokenStream(l));
    p.setTrace(true);
    p.addErrorListener(new BaseErrorListener() {
        @Override
        public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line,
                int charPositionInLine, String msg, RecognitionException e) {
            throw new IllegalStateException("failed to parse at line " + line + " due to " + msg, e);
        }
    });
    p.hgvs();
}

The test fails with the message "line 1:2 mismatched input 'A3L' expecting AA" . 测试失败，并显示以下消息： “行1：2输入A3L不匹配，期望AA” 。 I assume that this is related to lexing, ie splitting "A3L" into the three tokens A , 3 , and L , such that the parser can then generate the corresponding syntax subtree containing the three terminals from it. 我假设这与词法化有关，即将"A3L"分为三个标记A ， 3和L ，以便解析器可以从中生成包含三个终端的相应语法子树。

What is going wrong here and where can I learn how to fix this? 这里出了什么问题，我在哪里可以找到解决方法？

The grammar 语法

grammar HGVS;

hgvs: protein_var
    ;

// Basix lexemes

AA: AA1
  | AA3
  | 'X';

AA1: 'A'
   | 'R'
   | 'N'
   | 'D'
   | 'C'
   | 'Q'
   | 'E'
   | 'G'
   | 'H'
   | 'I'
   | 'L'
   | 'K'
   | 'M'
   | 'F'
   | 'P'
   | 'S'
   | 'T'
   | 'W'
   | 'Y'
   | 'V';

AA3: 'Ala'
   | 'Arg'
   | 'Asn'
   | 'Asp'
   | 'Cys'
   | 'Gln'
   | 'Glu'
   | 'Gly'
   | 'His'
   | 'Ile'
   | 'Leu'
   | 'Lys'
   | 'Met'
   | 'Phe'
   | 'Pro'
   | 'Ser'
   | 'Thr'
   | 'Trp'
   | 'Tyr'
   | 'Val';

NUMBER: [0-9]+;

NAME: [a-zA-Z0-9_]+;

// Top-level Rule

/** Variant in a protein. */
protein_var: 'p.' AA NUMBER AA
           ;

Answer 1

There are two problems: 有两个问题：

Define the rule for protein_var ahead of the lexer rules (should work now to, but is not easy to read because the other parser rule is ahead). 在lexer规则之前定义protein_var的规则（现在应该可以使用，但是由于另一个解析器规则在前面，因此不易阅读）。
Remove the rule for NAME . 删除NAME的规则。 A3L is not (as you probably expected) AA NUMBER AA but NAME <= ANTLR always prefers the longest matching lexer rule A3L不是（如您可能预期的那样） AA NUMBER AA但是NAME <= ANTLR总是喜欢最长的匹配词法分析器规则

The resulting grammar should look like: 生成的语法应如下所示：

grammar HGVS;

hgvs
    : protein_var
    ;

protein_var
    : 'p.' AA NUMBER AA
    ;

AA: ...;

AA3: ...;

AA1: ...;

NUMBER: [0-9]+;

If you need NAME for other purposes, you will have to disambiguate it in the lexer (by a prefix that NAME s and AA do not have in common or by using lexer modes). 如果您需要NAME用于其他目的，则必须在词法分析器中消除它的歧义（使用NAME和AA并不通用的前缀或使用词法分析器模式）。

令牌与Antlr4的匹配

问题描述

The grammar 语法

1 个解决方案

解决方案1
0 2015-05-02 15:00:39

令牌与Antlr4的匹配

问题描述

The grammar 语法

1 个解决方案

解决方案1 0 2015-05-02 15:00:39

解决方案1
0 2015-05-02 15:00:39