简体   繁体   中英

ABNF rule `zero = [“0”] “0”` matches `00` but not `0`

I have the following ABNF grammar:

zero = ["0"] "0"

I would expect this to match the strings 0 and 00 , but it only seems to match 00 ? Why?

repl-it demo: https://repl.it/@DanStevens/abnf-rule-zero-0-0-matches-00-but-not-0

Good question.

ABNF ("Augmented Backus Naur Form"9 is defined by RFC 5234 , which is the current version of a document intended to clarify a notation used (with variations) by many RFCs.

Unfortunately, while RFC 5234 exhaustively describes the syntax of ABNF, it does not provide much in the way of a clear statement of semantics . In particular, it does not specify whether ABNF alternation is unordered (as it is in the formal language definitions of BNF) or ordered (as it is in "PEG" -- Parsing Expression Grammar -- notation). Note that optionality/repetition are just types of alternation, so if you choose one convention for alternation, you'll most likely choose it for optionality and repetition as well.

The difference is important in cases like this. If alternation is ordered, then the parser will not backup to try a different alternative after some alternative succeeds. In terms of optionality, this means that if an optional element is present in the stream, the parser will never reconsider the decision to accept the optional element, even if some subsequent element cannot be matched . If you take that view, then alternation does not distribute over concatenation. ["0"]"0" is precisely ("0"/"")"0" , which is different from "00"/"0" . The latter expression would match a single 0 because the second alternative would be tried after the first one failed. The former expression, which you use, will not.

I do not believe that the authors of RFC 5234 took this view, although it would have been a lot more helpful had they made that decision explicit in the document. My only real evidence to support my belief is that the ABNF included in RFC 5234 to describe ABNF itself would fail if repetition was considered ordered. In particular, the rule for repetitions:

repetition     =  [repeat] element
repeat         =  1*DIGIT / (*DIGIT "*" *DIGIT)

cannot match 7*"0" , since the 7 will be matched by the first alternative of repeat , which will be accepted as satisfying the optional [repeat] in repetition , and element will subsequently fail.

In fact, this example (or one similar to it) was reported to the IETF as an erratum in RFC 5234 , and the erratum was rejected as unnecessary, because the verifier believed that the correct parse should be produced, thus providing evidence that the official view is that ABNF is not a variant of PEG. Apparently, this view is not shared by the author of the APG parser generator (who also does not appear to document their interpretation.) The suggested erratum chose roughly the same solution as you came up with:

repeat         =  *DIGIT ["*" *DIGIT]

although that's not strictly speaking the same; the original repeat cannot match the empty string, but the replacement one can. (Since the only use of repeat in the grammar is optional, this doesn't make any practical difference.)

(Disclosure note: I am not a fan of PEG. So it's possible the above answer is not free of bias.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM