简体   繁体   中英

Antlr4 parser fails - need backtracking?

I'm developing a grammar for a given language. I believe the grammar I've come up should work - but Antlr4 is of different opinion. Given the errors, it looks like missing backtracking. But Antlr4 is supposed to parse without that...

Each of the examples should have exactly one solution. There are ambiguities during parsing, however all but one option should turn out to be dead ends. So I expect the parser to go back and try the next possible approach. But it just reports a syntax error.

Quick summary of the grammer: There are elements seperated by '#'. After an element, there could be an optional jump, which is indicated by a single '=' . If the element itself contains a '#' or '=' , these are escaped by duplicating them. To avoid ambiguity, it is not allowed for an element to end with '#'. So a '###' is always first the separator, then the escaped first character of the next element. A '####' is no separator, just two escaped '#' inside a name.

The grammer:

grammar ConfigPath;
configpath: toplevelement subprojectelement* EOF;
subprojectelement:  '#' path jump?;
toplevelement:      '#' path jump?;
jump:   jumpcommand '=' jumpdestination;
jumpcommand: '#d' | '#devpath';
jumpdestination: NONHASHCHAR+;              
path: pathelement ( '/' pathelement)*;             
pathelement: escapedCharacterHash* escapedCharacter ;
escapedCharacterHash: escapedCharacter | '##';
escapedCharacter: NONHASHCHAR | '==';
NONHASHCHAR: ~('#' | '/' | '=' );
HASH: '#';
EQ: '=';

The tests, with parser errors as comments

@Test
public void testTripleHash() throws Exception {
    ConfigpathContext c = parse("#BU/ConfigPath###sub"); 
    // line 1:16 extraneous input '#' expecting {'##', '==', NONHASHCHAR}

    Assert.assertEquals( "#BU/ConfigPath", c.toplevelement().getText() );
    Assert.assertEquals( "###sub", c.subprojectelement().get(0).path().getText() );
}

Since the pathelement cannot end with a hash, the first of the triple hash should close the toplevelelement and start the subprojectelement, which begins with a ##

@Test
public void testDoubleHash() throws Exception {
    ConfigpathContext c = parse("#BU/proj##bla#d==u##bla");
    // line 1:15 mismatched input '==' expecting '='

    Assert.assertEquals( "#BU/proj##bla", c.toplevelement().getText() );
    Assert.assertEquals( "#d==u##bla", c.subprojectelement().get(0).getText() );
}

@Test
public void testJumps() throws Exception {
    ConfigpathContext c = parse("#BU/pro##dla#du##d==la#d=dest");
    // line 1:14 missing '=' at 'u'

    Assert.assertEquals( "#BU/pro##dla", c.toplevelement().getText() );
    Assert.assertEquals( 1, c.subprojectelement().size());
    Assert.assertEquals( "#du##d==la", c.subprojectelement().get(0).path().getText() );
    Assert.assertEquals( "dest", c.subprojectelement().get(0).jump().jumpdestination().getText() );
}


private ConfigpathContext parse(String src) {
    ConfigPathParser parser = new ConfigPathParser(new CommonTokenStream(new ConfigPathLexer(new ANTLRInputStream(src))));
    parser.addErrorListener(new BaseErrorListener() {
        @Override
        public void syntaxError(Recognizer<?, ?> recognizer, Object offendingSymbol, int line, int charPositionInLine, String msg, RecognitionException e) {
            throw new RuntimeException("line " + line + ":" + charPositionInLine + " " + msg );
        }
    });
    return parser.configpath();
}

Is there any way to change the grammar to accept the tests? Or is Antlr4 just not able to parse such a grammar? Would Antlr3 with backtracking find the solutions?

The grammer was wrong - thanks to cantSleepNow for stating that.

While I haven't understood every detail of the problem, it seems to be related to ambiguities in the Lexer. The parser is able to resolve ambiguities through its alternative to backtracking, but the Lexer can't.

So here is the working grammer:

grammar ConfigPath;

configpath: toplevelement subprojectelement* EOF;

subprojectelement:  '#' path jump?;

toplevelement:      '#' path jump?;

jump:   jumpcommand '=' jumpdestination;

jumpdestination : string;

jumpcommand: HASH D 'devpath'?;

path: pathelement ( '/' pathelement)*;             
pathelement: escapedCharacterHash* escapedCharacter ;

escapedCharacterHash: escapedCharacter | HASH HASH;
escapedCharacter: string | EQ EQ;
string  : (NONHASHCHAR | D)+;
NONHASHCHAR: ~('#' | '/' | '=' | 'd' );
D: 'd';
HASH: '#';
EQ: '=';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM