简体   繁体   中英

ANTLR4 Lexer doesn't like Cisco ACE

So, I'm try to parse something like this.

permit 16 any eq 30 www any eq 80 established log-input

The parse tree I'm aiming for looks like this. actual output from test rig

As you can see, the 16 is my problem. I've nested rules, and it doesn't like it.

The relevant section...

ace  : remarks? action source destination ops;
action: ( P | D ) PROTO ;
P : 'permit' ;
D : 'deny' ;
NUMBER : [0-9]+ ;
PROTO : 'ip'
      | 'tcp'
      | 'udp'
      | 'eigrp'
      | 'icmp'
      | NUMBER
      ;
ID : [a-zA-Z-]+ ;

If Number is 1st, I get the RED 16, if PROTO is 1st, then all the ports downstream turn RED.

I get that it's just running my LEX rules in order, and they are ambiguous. PROTO can match any number, and so can NUMBER.

However I tried to solve that with nesting them, and fragments, to no avail.

ace  : remarks? action source destination ops;
action: ( P | D ) ;
P : 'permit' PROTO;
D : 'deny' PROTO;
NUMBER : [0-9]+ ;
fragment PROTO : 'ip'
      | 'tcp'
      | 'udp'
      | 'eigrp'
      | 'icmp'
      | NUMBER
      ;
ID : [a-zA-Z-]+ ;

As soon as I do that, my 'catch-all' ID starts gobbling everything up, and It's still in a tree, but all my token types turn to IDs.

I've looked around this forum, and the gargler for hours, and I haven't seen any way to sort this out, however oddly, the behavior I want is working elsewhere in this same grammar.

destination : address ports? ;
address : ADDRESS ADDRESS | HST ADDRESS | ANY ;
ADDRESS : QUAD DOT QUAD DOT QUAD DOT QUAD ;
fragment QUAD : TWO LO5 LO5 | TWO LO4 DIG | ONE DIG DIG | DIG DIG | DIG ;
fragment DOT : '.' ;
fragment ONE : [1] ;
fragment TWO : [2] ;
fragment LO4 : [0-4] ;
fragment LO5 : [0-5] ;
fragment DIG : [0-9] ;

That works like a champ, only grabs IPs and host addresses without failure. Of course the 'ports?' section still goes to garbage. However using that same setup, can't seem to grab PORT/PROTOCOL.

I'm missing something fundamental, and after rearranging this thing for far too long... I'm wondering if I should not try and get such specific TOKEN IDs, and handle it in post (aka, with a listener later) or if my tree should contain proper token tags.

** Technically protocols addressed by number should be < 256 so they are a QUAD as I've defined them, but I can't get that working...

Ideas? Suggests? I have the tree, so who cares if it's a number in that spot? I know the parent is action, so the right hand tree being a number should be validated as less than 256 later? I assume the ambiguity is killing it, and if I could redesign this thing removing all ambiguity somehow?

(BTW I'm a self taught novice, so try and speak to me like I have no college education in computer science, I've never read the dragon book, and I've been programming with ANTLR for 4 days... because that is who you're talking to.)

I know this is an old thread. But hope this helps someone. You can actually use a catch all token to parse most values, and then convert it to more specific values (enums, constants etc) in your visitor/listener code. This is a stripped down example of the grammar that had worked for me

Base Grammar:


    lexer grammar Base ;
    
    fragment LOWERCASE  : [a-z];
    fragment UPPERCASE  : [A-Z];
    fragment NUMBER     : [0-9]+;
    fragment WORD       : (LOWERCASE | UPPERCASE | NUMBER | '-' | '_' | '/')+;
    fragment NEWLINE    : '\r' '\n'
        | '\n'
        | '\r';
    
    fragment OBJECT_DESCRIPTION     : ' description';CRLF
        : NEWLINE ;
    
    VALUE
        : (WORD | '.')+ ;
    
    WHITESPACE
        : ' '   -> skip ;
    
    IGNORE
        : .     -> skip ;
    

Access rule grammar:


    grammar AccessList;
    
    import Base;
    
    //access-list acl-1 extended permit udp | object network-object-1 eq 123 (source)| 1.1.1.1 ne www (destination)|
    
    accessLists             : (accessListDestination)+;
    accessListDestination   : accessListSource accessListTarget (' rule-id' aclId = VALUE)?;
    accessListSource        : accessListProtocol accessListTarget;
    accessListTarget: (
            ({_input.LT(1).getText().matches("object|object-group")}? objectType = VALUE objectName = VALUE)
            |({_input.LT(1).getText().matches("host")}? host=VALUE ip=VALUE) | ip = VALUE 
        ) accessListPorts?;
    accessListPorts         : (accessListPort | accessListPortRange);
    accessListPortRange     : VALUE startPort=VALUE endPort=VALUE;
    accessListPort          : operatorOrObjectType=VALUE portOrPortGroup=VALUE;
    
    accessListProtocol:
        ACCESS_LIST_KEY name = VALUE accessListType = ACCESS_LIST_TYPE action = VALUE protocol = VALUE accessListInterface?;
    
    accessListInterface: 'ifc' accessListInterfaceName=VALUE;
    
    fragment ACCESS_LIST    : 'access-list';
    fragment EXTENDED       : 'extended';
    fragment ADVANCED       : 'advanced';

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM