GDL Antlr grammar

I need a parser for Game Description Language (GDL) in Java

For this I am currently trying to use ANTLR4.

my current grammar given in the following does seem to be not correct or at least the generated parser does not recognize a game description which i will also provide below.

The ANTLR4-Grammar:

grammar GDL;

description :  (gdlRule | sentence)+ ;

gdlRule : '(' SP? '<=' SP? sentence (SP literal)* SP? ')';

sentence : propLit | ( '(' relLit ')' );

literal : ( '(' SP? (orLit | notLit | distinctLit | relLit) SP? ')' ) 
| ( '('  (orLit | notLit | distinctLit | relLit) ')' ) 
| propLit;
notLit : 'not' SP literal | '~' literal;
orLit : 'or' SP literal* ;
distinctLit : 'distinct' SP term SP term;
propLit : constant;
relLit : constant (SP term)+;

term : ( '(' funcTerm ')' ) | varTerm | constTerm;
funcTerm : constant (SP term)*;
varTerm : '?' constant;
constTerm : constant;

constant : ident | number;
/* ident is any string of letters, digits, and underscores */
ident: ID;
number: NR;
NR : [0-9]+;
ID : [a-zA-Z] [a-zA-Z0-9]* ;
SP : ' '+;

COMMENT : ';'[A-Za-z0-9; \r\t]*'\n' -> skip;
WS : [ ;\t\r\n]+ -> skip

The game description given in GDL:

;;; Tictictoe

  (role white)
  (role black)


  (init (cell 1 1 b))
  (init (cell 1 2 b))
  (init (cell 1 3 b))
  (init (cell 2 1 b))
  (init (cell 2 2 b))
  (init (cell 2 3 b))
  (init (cell 3 1 b))
  (init (cell 3 2 b))
  (init (cell 3 3 b))
  (init (step 1))


  (<= (next (cell ?j ?k x))
      (true (cell ?j ?k b))
      (does white (mark ?j ?k))
      (does black (mark ?m ?n))
      (or (distinct ?j ?m) (distinct ?k ?n)))

  (<= (next (cell ?m ?n o))
      (true (cell ?m ?n b))
      (does white (mark ?j ?k))
      (does black (mark ?m ?n))
      (or (distinct ?j ?m) (distinct ?k ?n)))

  (<= (next (cell ?m ?n b))
      (true (cell ?m ?n b))
      (does white (mark ?m ?n))
      (does black (mark ?m ?n)))

  (<= (next (cell ?p ?q b))
      (true (cell ?p ?q b))
      (does white (mark ?j ?k))
      (does black (mark ?m ?n))
      (or (distinct ?j ?p) (distinct ?k ?q))
      (or (distinct ?m ?p) (distinct ?n ?q)))

  (<= (next (cell ?m ?n ?w))
      (true (cell ?m ?n ?w))
      (distinct ?w b))

  (<= (next (step ?y))
      (true (step ?x))
      (succ ?x ?y))

  (succ 1 2)
  (succ 2 3)
  (succ 3 4)
  (succ 4 5)
  (succ 5 6)
  (succ 6 7)

  (<= (row ?m ?x)
      (true (cell ?m 1 ?x))
      (true (cell ?m 2 ?x))
      (true (cell ?m 3 ?x)))

  (<= (column ?n ?x)
      (true (cell 1 ?n ?x))
      (true (cell 2 ?n ?x))
      (true (cell 3 ?n ?x)))

  (<= (diagonal ?x)
      (true (cell 1 1 ?x))
      (true (cell 2 2 ?x))
      (true (cell 3 3 ?x)))

  (<= (diagonal ?x)
      (true (cell 1 3 ?x))
      (true (cell 2 2 ?x))
      (true (cell 3 1 ?x)))

  (<= (line ?x) (row ?m ?x))
  (<= (line ?x) (column ?m ?x))
  (<= (line ?x) (diagonal ?x))

  (<= nolinex
      (not (line x)))
  (<= nolineo
      (not (line o)))


  (<= (legal white (mark ?x ?y))
      (true (cell ?x ?y b)))

  (<= (legal black (mark ?x ?y))
      (true (cell ?x ?y b)))


  (<= (goal white 50)
      (line x)
      (line o))

  (<= (goal white 100)
      (line x)

  (<= (goal white 0)
      (line o))

  (<= (goal white 50)

  (<= (goal black 50)
      (line x)
      (line o))

  (<= (goal black 100)
      (line o))

  (<= (goal black 0)
      (line x)

  (<= (goal black 50)


  (<= terminal
      (true (step 7)))

  (<= terminal
      (line x))

  (<= terminal
      (line o))


The error output of the generated parser:

line 24:6 mismatched input '(' expecting {')', SP}
line 27:7 no viable alternative at input '(or'

I don't know what i have to change or how to get a correct grammar

Any help would be appreciated

The problem is your handling of whitespace.

You have two rules, one of which creates a token:

SP : ' '+;

and the other one which simply ignores the whitespace:

WS : [ ;\t\r\n]+ -> skip

If the whitespace starts with a space character, the first rule will apply and you will get a SP token. If the whitespace starts with a newline or some other character listed in the WS rule, the entire run of whitespace will be ignored.

Since your grammar insists on SP tokens at certain points, the ignored whitespace will cause a syntax error.

There is no reason that I can see to complicate your grammar with explicit whitespace. I would get rid of SP , remove all references to it in your grammar, and just let WS ignore whitespace.

I would also remove the semicolon from WS to avoid interactions with COMMENT . [Note 1] And I would simplify COMMENT so that it just ignores frim a semicolon to the end of the line, rather than gaving a list of valid comment characters. (What if you want to put a comma or a * in a comment?)


  1. You would see this problem if there were a newline at the beginning of the file, with the row of semicolons at line 2. Then COMMENT does not match at the first character, but WS does. WS will then match (and ignore) the newline, the row of semicolons, the next newline, the semicolons at the beginning of the next line, and the following space, leaving Tictictoe to be scanned as an ID , which will cause a parse error.

    You would also see it if any other comment were something other than a row of semicolons. These are currently being scanned as WS , starring with the newline before the comment. That happens to be ok, since the comment only includes semicolons. But any other non-whitespace character would terminate the WS and then be unexpectedly parsed as program text.

(At least) 3 things are incorrect:

  • you include ; in your WS rule and its the start of your COMMENT
  • your COMMENT rule says it needs to end with a line break. However, line breaks are already included in the WS rule, and it would disallow comments that end with EOF (without a line break)
  • SP is not needed: spaces need to be skipped and not included in your parser rules

Try something like this instead:

grammar GDL;

description :  (gdlRule | sentence)+ ;

gdlRule : '(' '<=' sentence literal* ')';

sentence : propLit | ( '(' relLit ')' );

 : ( '(' (orLit | notLit | distinctLit | relLit) ')' )
 | ( '('  (orLit | notLit | distinctLit | relLit) ')' )
 | propLit

notLit : 'not' literal | '~' literal;
orLit : 'or' literal* ;
distinctLit : 'distinct' term term;
propLit : constant;
relLit : constant (term)+;
term : ( '(' funcTerm ')' ) | varTerm | constTerm;
funcTerm : constant (term)*;
varTerm : '?' constant;
constTerm : constant;
constant : ident | number;
ident: ID;
number: NR;

NR : [0-9]+;
ID : [a-zA-Z] [a-zA-Z0-9]*;
COMMENT : ';'[A-Za-z0-9; \r\t]* -> skip;
WS : [ \t\r\n]+ -> skip;

