简体   繁体   中英

Antlr: common token definitions

I'd like to define common token constants in a single central Antlr file. This way I can define several different lexers and parsers and mix and match them at runtime. If they all share a common set of token definitions, then they'll work fine.

In other words, I want to see public static final int WORD = 2; in each lexer, so they all agree that a "2" is a WORD.

I created a file named CommonTokenDefs.g4 and added a section like this:

tokens {
WORD, NUMBER
}

and included

options { tokenVocab = CommonTokenDefs; }

in each of my other .g4 files. It doesn't work. A .g4 file that includes the tokenVocab will assign a different constant int if it defines a token type, and worse, in its .tokens file it will include duplicate constants!

FOO=1
BAR=2
WORD=1
NUMBER=2

Doing an import CommonTokenDefs; doesn't work either, because if I define a token type in the lexer, and it's already in CommonTokenDefs then I get a "token name FOO is already defined" error.

How do I create a common vocabulary across lexers and parsers?

Including a grammar means to merge it . The imported grammar is not an own instance but instead enriches the grammar it is imported in. And the importing grammar numbers its tokens based on what is defined in it (and adds tokens from the imported grammar).

The only solution I see here is use a single lexer grammar in all your parser, if that is possible. You can implement certain variations in your lexer by using different base lexers (ANTLR option: superClass ), but that is of course limited and especially doesn't allow to add more tokens.

Update

Actually, there is a way to make it work as you want it. In addition to the import statement (which is used to import grammars) there is the tokenVocab grammar option, which is used to load a *.tokens file with assignments of number values to tokens. By using such a token vocabulary you could predefine which value ANTLR should use for each token and can hence determine that certain tokens always get the same numeric value. See the generated *.tokens files for the required format.

I use *.tokens files to assign numeric value such that certain keywords are placed in a continuous value range, which allows for efficient checks later, like:

if (token >= KW_1 && token < KW100) ...

which wouldn't be possible if ANTLR would freely assign values to each of the keywords.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM