简体   繁体   中英

Negate POSIX or Unicode character classes in ANTLR Lexer (C#)

ANTLR build system:
Visual Studio 2017, C#
NuGet packages: Antlr4.CodeGenerator 4.6.5-rc002, Antlr4.Runtime 4.6.5-rc002

I've got the following Flex rule which I'd like to convert to ANTLR 4:

NOT_NAME    [^[:alpha:]_*\n]+

I think that I've already found out that ANTLR doesn't support POSIX or Unicode character classes but that you can create fragments to include them into your lexer grammar.

In my attempt to translate the above rule I've already created the following fragments:

fragment ALPHA: L | Nl;
fragment L  : Ll | Lm | Lo | Lt | Lu ;
fragment Ll : '\u0061'..'\u007A' ; /* rest omitted for brevity */
fragment Lm : '\u02B0'..'\u02C1' ; /* rest omitted for brevity */
fragment Lo : '\u00AA' | '\u00BA' ; /* rest omitted for brevity */
fragment Lt : '\u01C5' | '\u01C8' ; /* rest omitted for brevity */
fragment Lu : '\u0041'..'\u005A'  ; /* rest omitted for brevity */
fragment Nl : '\u16EE'..'\u16F0'  ; /* rest omitted for brevity */

The ANTLR rule I had thought would work was the following:

NOT_NAME: ~(ALPHA | '_' | '*' | '\n')+;

but it gives me the following error:

rule reference 'ALPHA' is not currently supported in a set

The problem seems to be the negation as rules without negation seem to work without problems.

I know that it works if I inline all the above fragments into one rule but this appears insanely complicated to me - especially given the pretty simple and straightforward Flex rule.

I must be missing some elegant trick that you will possibly point me to.

The Unicode characterset support doesn't depend on the target runtime. The ANTLR4 tool itself converts the grammars and also parses the charset definitions. You should be able to use any of the Unicode classes as laid out in the lexer documentation . I'm not sure however if you can negate that block with the tilde. At least there is the option to use \\P... to negate a char class (also mention in that document).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM