簡體   English   中英

Lark Parser:沒有為“:”定義終端(似乎對冒號字符“:”有偏見)

[英]Lark Parser: No terminal defined for ':' (Seeming bias against colon character ":")

我有以下規則(取自SMTP - RFC5321 ):

!path : "<" [ a_d_l ":" ] mailbox ">"

當我嘗試解析這一行時:

<test.com:test.test@testtest.com>

我收到以下錯誤:

No terminal defined for ':'

不尋常的是,如果我只是將":"更改為"_" ,它會以某種方式起作用:

!path : "<" [ a_d_l "_" ] mailbox ">"
<test.com_test.test@testtest.com>

同樣有效的是一行不包括該部分[ a_d_l ":" ] (這是可選的,如[]

!path : "<" [ a_d_l ":" ] mailbox ">"
<test.test@testtest.com>

我已經嘗試為冒號定義一個終端規則,但這也不起作用:

!path : "<" [ a_d_l COLON ] mailbox ">"
COLON : ":"
<test.test@testtest.com>

最小可重現示例:

按照評論中的要求。

from lark import Lark

grammar = r'''
!path               : "<" [ a_d_l ":" ] mailbox ">"
a_d_l               : at_domain ( "," at_domain )*    
at_domain           : "@" domain

domain                  : sub_domain ("." sub_domain)*
sub_domain              : let_dig [ldh_str]
let_dig                 : ALPHA | DIGIT
!ldh_str                 : ( ALPHA | DIGIT | "-" )* let_dig
address_literal         : "[" ( ipv4_address_literal | ipv6_address_literal | general_address_literal ) "]"
ipv4_address_literal    : snum ("."  snum)~3
snum                    : DIGIT~1..3
ipv6_address_literal    : "ipv6:" ipv6_addr
ipv6_addr               : ipv6_full | ipv6_comp | ipv6v4_full | ipv6v4_comp
ipv6_full               : ipv6_hex (":" ipv6_hex)~7
ipv6_hex                : HEXDIG~1..4
!ipv6_comp               : [ipv6_hex (":" ipv6_hex)~0..5] "::" [ipv6_hex (":" ipv6_hex)~0..5]
!ipv6v4_full             : ipv6_hex (":" ipv6_hex)~5 ":" ipv4_address_literal
!ipv6v4_comp             : [ipv6_hex (":" ipv6_hex)~0..3] "::" [ipv6_hex (":" ipv6_hex)~0..3 ":"] ipv4_address_literal
!general_address_literal : standardized_tag ":" dcontent+
standardized_tag        : ldh_str
dcontent                : /[\x21-\x5A|\x5E-\x7E]/

mailbox        : local_part /[\x40]/ ( domain | address_literal ) 
local_part     : dot_string | quoted_string 

dot_string     : atom ("."  atom)*
atom           : atext+
quoted_string  : /[\x22]/ qcontentsmtp* /[\x22]/
qcontentsmtp   : qtextsmtp | quoted_pairsmtp
quoted_pairsmtp  : /[\x5C\x5C]/ /[\x20-\x7E]/
qtextsmtp      : /[\x20-\x21|\x23-\[\]-\x7E]/
atext          : /[\x21|\x23-\x27|\x2A|\x2B|\x2D|\x2F-\x39|\x3D|\x3F|\x41-\x5A|\x5E-\x7E]/

command : [ path ]
%import common.WS       -> SP
%import common.NEWLINE  -> CRLF
%import common.DIGIT
%import common.LETTER   -> ALPHA
%import common.HEXDIGIT -> HEXDIG'''

input = "<test.com:test.test@testtest.com>"

try:
    result = Lark(grammar, start="command").parse(input)
except Exception as ex:
    print('####### Parsing Failed')
    print(ex)
    traceback.print_exc()
    result = None
return result
!path               : "<" [ a_d_l ":" ] mailbox ">"
a_d_l               : at_domain ( "," at_domain )*    
at_domain           : "@" domain

只會匹配"<@test.com:test.test@testtest.com>" 它不能匹配"<test.com:test.test@testtest.com>"因為它不是以"<" at_domain"<" mailbox "<" at_domain

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM