简体   繁体   中英

LEX Pattern for matching compressed textual representation of an IP version 6 address

I am aware that there are lots of post on stack overflow and elsewhere of regular expressions, including LEX patterns for IPV6 addresses. None of them appear to be truly complete and indeed some requirements do not need to parse all possible Address Formats.

I am looking for a LEX pattern for IP version 6 address only for addresses represented in compressed textual form. This form is described in Section 2.2 of RFC 5952 (and possibly other related RFCs) and represents a relatively small subset of all possible IPv6 address formats.

If anyone has one which is well tested or is aware of one, please forward it.

RFC 5952 §2.2 does not formally describe the compressed IPv6 address form. The goal of RFC 5952 is to produce a "canonical textual representation form"; that is, a set of textual encodings which has a one-to-one relationship with the set of IPv6 addresses. Section 2.2 enumerates a few aspects of the compressed form which lead to encoding options; a canonical representation needs to eliminate all options.

The compressed syntax is actually described in clause 2 of RFC 4291 §2.2 . That syntax is easy enough to describe as a regular expression, although it's a little annoying; it would be easier in a syntax which includes the intersection of two regular expressions (Ragel provides that operator, for example), but in this case a simple enumeration of possibilities suffices.

If you really want to limit the matches to the canonical representations listed in RFC 5952 §4.2 , then you have a slightly more daunting task because of the requirement that the compressed run of 0s must be the longest run of 0s in the uncompressed address, or the first such run if there is more than one longest run of the same length.

That would be possible by making a much longer enumeration of permissible forms where the compressed run satisfies the "first longest" constraint. But I'm really not sure that there is any value in creating that monster, since RFC 5952 is quite clear that the intent is to restrict the set of representations produced by a conforming application (emphasis added):

…[A]ll implementations MUST accept and be able to handle any legitimate RFC4291 format.

Since regular expressions are mostly of use in recognising and parsing inputs, it seems unnecessary to go to the trouble of writing and verifying the list of possible canonical patterns.

An IPv6 address conforming to clause 1 of RFC 4291 §2.2 can easily be described in lex syntax:

piece      [[:xdigit:]]{1,4}
%%
{piece}(:{piece}){7}      { /* an uncompressed IPv6 address */ }

In passing, although it seems unnecessary for the same reasons noted above, it's very simple to restrict {piece} to the canonical 16-bit representations (lower-case only, no leading zeros):

piece    0|[1-9a-f][0-9a-f]{0,3}

The complication comes with the requirement in clause 2 that only one run of 0s be compressed. It's easy to write a regular expression which allows only one number to be omitted:

(({piece}:)*{piece})?::({piece}(:{piece})*)?

but that formulation no longer limits the number of pieces to 8. It's also fairly easy to write a regular expression which allows omitted pieces, limiting the number of fields:

{piece}(:{piece}?){1,6}:{piece}|:(:{piece}){1,7}|({piece}:){1,7}:|::

What's desired is the intersection of those two patterns, plus the pattern for uncompressed addresses. But, as mentioned, there's no way of writing intersections in (f)lex. So we end up enumerating possibilities. A simple enumeration is the number of initial uncompressed pieces:

(?x:  /* Flex's extended syntax allows whitespace and continuation lines */
    {piece}(:{piece}){7}
  | {piece}             ::{piece}(:{piece}){0,5}
  | {piece}:{piece}     ::{piece}(:{piece}){0,4}
  | {piece}(:{piece}){2}::{piece}(:{piece}){0,3}
  | {piece}(:{piece}){3}::{piece}(:{piece}){0,2}
  | {piece}(:{piece}){4}::{piece}(:{piece})?
  | {piece}(:{piece}){5}::{piece}
  | {piece}(:{piece}){0,6}::
  | ::{piece}(:{piece}){0,6}
  | ::
)

That still excludes the various forms of embedding IPv4 addresses in IPv6, but it should be clear how to add those, if desired.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM