[英]How to match any terminal string with a regex in the Python Lark Parser and the meaning of /.+/>?
[英]Lark Parser: No terminal defined for ':' (Seeming bias against colon character ":")
我有以下規則(取自SMTP - RFC5321 ):
!path : "<" [ a_d_l ":" ] mailbox ">"
當我嘗試解析這一行時:
<test.com:test.test@testtest.com>
我收到以下錯誤:
No terminal defined for ':'
不尋常的是,如果我只是將":"
更改為"_"
,它會以某種方式起作用:
!path : "<" [ a_d_l "_" ] mailbox ">"
<test.com_test.test@testtest.com>
同樣有效的是一行不包括該部分[ a_d_l ":" ]
(這是可選的,如[]
)
!path : "<" [ a_d_l ":" ] mailbox ">"
<test.test@testtest.com>
我已經嘗試為冒號定義一個終端規則,但這也不起作用:
!path : "<" [ a_d_l COLON ] mailbox ">"
COLON : ":"
<test.test@testtest.com>
最小可重現示例:
按照評論中的要求。
from lark import Lark
grammar = r'''
!path : "<" [ a_d_l ":" ] mailbox ">"
a_d_l : at_domain ( "," at_domain )*
at_domain : "@" domain
domain : sub_domain ("." sub_domain)*
sub_domain : let_dig [ldh_str]
let_dig : ALPHA | DIGIT
!ldh_str : ( ALPHA | DIGIT | "-" )* let_dig
address_literal : "[" ( ipv4_address_literal | ipv6_address_literal | general_address_literal ) "]"
ipv4_address_literal : snum ("." snum)~3
snum : DIGIT~1..3
ipv6_address_literal : "ipv6:" ipv6_addr
ipv6_addr : ipv6_full | ipv6_comp | ipv6v4_full | ipv6v4_comp
ipv6_full : ipv6_hex (":" ipv6_hex)~7
ipv6_hex : HEXDIG~1..4
!ipv6_comp : [ipv6_hex (":" ipv6_hex)~0..5] "::" [ipv6_hex (":" ipv6_hex)~0..5]
!ipv6v4_full : ipv6_hex (":" ipv6_hex)~5 ":" ipv4_address_literal
!ipv6v4_comp : [ipv6_hex (":" ipv6_hex)~0..3] "::" [ipv6_hex (":" ipv6_hex)~0..3 ":"] ipv4_address_literal
!general_address_literal : standardized_tag ":" dcontent+
standardized_tag : ldh_str
dcontent : /[\x21-\x5A|\x5E-\x7E]/
mailbox : local_part /[\x40]/ ( domain | address_literal )
local_part : dot_string | quoted_string
dot_string : atom ("." atom)*
atom : atext+
quoted_string : /[\x22]/ qcontentsmtp* /[\x22]/
qcontentsmtp : qtextsmtp | quoted_pairsmtp
quoted_pairsmtp : /[\x5C\x5C]/ /[\x20-\x7E]/
qtextsmtp : /[\x20-\x21|\x23-\[\]-\x7E]/
atext : /[\x21|\x23-\x27|\x2A|\x2B|\x2D|\x2F-\x39|\x3D|\x3F|\x41-\x5A|\x5E-\x7E]/
command : [ path ]
%import common.WS -> SP
%import common.NEWLINE -> CRLF
%import common.DIGIT
%import common.LETTER -> ALPHA
%import common.HEXDIGIT -> HEXDIG'''
input = "<test.com:test.test@testtest.com>"
try:
result = Lark(grammar, start="command").parse(input)
except Exception as ex:
print('####### Parsing Failed')
print(ex)
traceback.print_exc()
result = None
return result
!path : "<" [ a_d_l ":" ] mailbox ">"
a_d_l : at_domain ( "," at_domain )*
at_domain : "@" domain
只會匹配"<@test.com:test.test@testtest.com>"
。 它不能匹配"<test.com:test.test@testtest.com>"
因為它不是以"<" at_domain
或"<" mailbox
"<" at_domain
。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.