简体   繁体   English

ANTLR4 Lexer 不喜欢 Cisco ACE

[英]ANTLR4 Lexer doesn't like Cisco ACE

So, I'm try to parse something like this.所以,我试着解析这样的东西。

permit 16 any eq 30 www any eq 80 established log-input permit 16 any eq 30 www any eq 80 established log-input

The parse tree I'm aiming for looks like this.我瞄准的解析树看起来像这样。 actual output from test rig来自测试台的实际 output

As you can see, the 16 is my problem.如您所见,16 是我的问题。 I've nested rules, and it doesn't like it.我嵌套了规则,但它不喜欢它。

The relevant section...相关部分...

ace  : remarks? action source destination ops;
action: ( P | D ) PROTO ;
P : 'permit' ;
D : 'deny' ;
NUMBER : [0-9]+ ;
PROTO : 'ip'
      | 'tcp'
      | 'udp'
      | 'eigrp'
      | 'icmp'
      | NUMBER
      ;
ID : [a-zA-Z-]+ ;

If Number is 1st, I get the RED 16, if PROTO is 1st, then all the ports downstream turn RED.如果 Number 是 1st,我得到 RED 16,如果 PROTO 是 1st,那么下游的所有端口都变成 RED。

I get that it's just running my LEX rules in order, and they are ambiguous.我知道它只是按顺序运行我的 LEX 规则,而且它们是模棱两可的。 PROTO can match any number, and so can NUMBER. PROTO 可以匹配任何数字,NUMBER 也可以。

However I tried to solve that with nesting them, and fragments, to no avail.但是我试图通过嵌套它们和片段来解决这个问题,但无济于事。

ace  : remarks? action source destination ops;
action: ( P | D ) ;
P : 'permit' PROTO;
D : 'deny' PROTO;
NUMBER : [0-9]+ ;
fragment PROTO : 'ip'
      | 'tcp'
      | 'udp'
      | 'eigrp'
      | 'icmp'
      | NUMBER
      ;
ID : [a-zA-Z-]+ ;

As soon as I do that, my 'catch-all' ID starts gobbling everything up, and It's still in a tree, but all my token types turn to IDs.一旦我这样做,我的“包罗万象”的 ID 就开始吞噬一切,它仍然在树上,但我所有的令牌类型都变成了 ID。

I've looked around this forum, and the gargler for hours, and I haven't seen any way to sort this out, however oddly, the behavior I want is working elsewhere in this same grammar.我已经环顾这个论坛和漱口器好几个小时了,但我还没有找到解决这个问题的任何方法,但奇怪的是,我想要的行为在同一语法的其他地方也有效。

destination : address ports? ;
address : ADDRESS ADDRESS | HST ADDRESS | ANY ;
ADDRESS : QUAD DOT QUAD DOT QUAD DOT QUAD ;
fragment QUAD : TWO LO5 LO5 | TWO LO4 DIG | ONE DIG DIG | DIG DIG | DIG ;
fragment DOT : '.' ;
fragment ONE : [1] ;
fragment TWO : [2] ;
fragment LO4 : [0-4] ;
fragment LO5 : [0-5] ;
fragment DIG : [0-9] ;

That works like a champ, only grabs IPs and host addresses without failure.这就像冠军一样,只抓取 IP 和主机地址而不会失败。 Of course the 'ports?'当然是“端口”? section still goes to garbage.部分仍然是垃圾。 However using that same setup, can't seem to grab PORT/PROTOCOL.但是使用相同的设置,似乎无法获取 PORT/PROTOCOL。

I'm missing something fundamental, and after rearranging this thing for far too long... I'm wondering if I should not try and get such specific TOKEN IDs, and handle it in post (aka, with a listener later) or if my tree should contain proper token tags.我遗漏了一些基本的东西,并且在重新排列这个东西太久之后......我想知道我是否应该尝试获取这种特定的令牌 ID,并在后期处理它(也就是稍后与听众一起处理)或者是否我的树应该包含适当的标记标签。

** Technically protocols addressed by number should be < 256 so they are a QUAD as I've defined them, but I can't get that working... ** 从技术上讲,按数字寻址的协议应小于 256,因此它们是我定义的 QUAD,但我无法正常工作...

Ideas?想法? Suggests?建议? I have the tree, so who cares if it's a number in that spot?我有树,所以谁在乎那个地方是不是数字? I know the parent is action, so the right hand tree being a number should be validated as less than 256 later?我知道父母是行动,所以右边的树是一个数字,以后应该验证为小于 256? I assume the ambiguity is killing it, and if I could redesign this thing removing all ambiguity somehow?我假设歧义正在扼杀它,如果我可以重新设计这个东西以某种方式消除所有歧义?

(BTW I'm a self taught novice, so try and speak to me like I have no college education in computer science, I've never read the dragon book, and I've been programming with ANTLR for 4 days... because that is who you're talking to.) (顺便说一句,我是一个自学成才的新手,所以试着和我说话,就像我没有接受过计算机科学的大学教育,我从未读过龙书,我已经用 ANTLR 编程了 4 天......因为那就是你在和谁说话。)

I know this is an old thread.我知道这是一个旧线程。 But hope this helps someone.但希望这对某人有帮助。 You can actually use a catch all token to parse most values, and then convert it to more specific values (enums, constants etc) in your visitor/listener code.实际上,您可以使用捕获所有令牌来解析大多数值,然后在您的访问者/侦听器代码中将其转换为更具体的值(枚举、常量等)。 This is a stripped down example of the grammar that had worked for me这是对我有用的语法的精简示例

Base Grammar:基本语法:


    lexer grammar Base ;
    
    fragment LOWERCASE  : [a-z];
    fragment UPPERCASE  : [A-Z];
    fragment NUMBER     : [0-9]+;
    fragment WORD       : (LOWERCASE | UPPERCASE | NUMBER | '-' | '_' | '/')+;
    fragment NEWLINE    : '\r' '\n'
        | '\n'
        | '\r';
    
    fragment OBJECT_DESCRIPTION     : ' description';CRLF
        : NEWLINE ;
    
    VALUE
        : (WORD | '.')+ ;
    
    WHITESPACE
        : ' '   -> skip ;
    
    IGNORE
        : .     -> skip ;
    

Access rule grammar:访问规则语法:


    grammar AccessList;
    
    import Base;
    
    //access-list acl-1 extended permit udp | object network-object-1 eq 123 (source)| 1.1.1.1 ne www (destination)|
    
    accessLists             : (accessListDestination)+;
    accessListDestination   : accessListSource accessListTarget (' rule-id' aclId = VALUE)?;
    accessListSource        : accessListProtocol accessListTarget;
    accessListTarget: (
            ({_input.LT(1).getText().matches("object|object-group")}? objectType = VALUE objectName = VALUE)
            |({_input.LT(1).getText().matches("host")}? host=VALUE ip=VALUE) | ip = VALUE 
        ) accessListPorts?;
    accessListPorts         : (accessListPort | accessListPortRange);
    accessListPortRange     : VALUE startPort=VALUE endPort=VALUE;
    accessListPort          : operatorOrObjectType=VALUE portOrPortGroup=VALUE;
    
    accessListProtocol:
        ACCESS_LIST_KEY name = VALUE accessListType = ACCESS_LIST_TYPE action = VALUE protocol = VALUE accessListInterface?;
    
    accessListInterface: 'ifc' accessListInterfaceName=VALUE;
    
    fragment ACCESS_LIST    : 'access-list';
    fragment EXTENDED       : 'extended';
    fragment ADVANCED       : 'advanced';

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM