用于匹配 IP 版本 6 地址的压缩文本表示的 LEX 模式

Question

I am aware that there are lots of post on stack overflow and elsewhere of regular expressions, including LEX patterns for IPV6 addresses.我知道有很多关于堆栈溢出和其他正则表达式的帖子，包括 IPV6 地址的 LEX 模式。 None of them appear to be truly complete and indeed some requirements do not need to parse all possible Address Formats.它们似乎都不是真正完整的，并且确实有些要求不需要解析所有可能的地址格式。

I am looking for a LEX pattern for IP version 6 address only for addresses represented in compressed textual form.我正在寻找 IP 版本 6 地址的 LEX 模式，仅适用于以压缩文本形式表示的地址。 This form is described in Section 2.2 of RFC 5952 (and possibly other related RFCs) and represents a relatively small subset of all possible IPv6 address formats.这种形式在RFC 5952 （以及可能的其他相关 RFC）的第 2.2 节中进行了描述，并且代表了所有可能的 IPv6 地址格式的一个相对较小的子集。

If anyone has one which is well tested or is aware of one, please forward it.如果有人有一个经过良好测试或知道的，请转发。

Answer 1

RFC 5952 §2.2 does not formally describe the compressed IPv6 address form. RFC 5952 §2.2没有正式描述压缩的 IPv6 地址形式。 The goal of RFC 5952 is to produce a "canonical textual representation form"; RFC 5952 的目标是产生一种“规范的文本表示形式”； that is, a set of textual encodings which has a one-to-one relationship with the set of IPv6 addresses.也就是说，一组文本编码与 IPv6 地址集具有一对一的关系。 Section 2.2 enumerates a few aspects of the compressed form which lead to encoding options;第 2.2 节列举了导致编码选项的压缩形式的几个方面； a canonical representation needs to eliminate all options.规范表示需要消除所有选项。

The compressed syntax is actually described in clause 2 of RFC 4291 §2.2 .压缩语法实际上在RFC 4291 §2.2的第 2 节中描述。 That syntax is easy enough to describe as a regular expression, although it's a little annoying;该语法很容易描述为正则表达式，尽管它有点烦人； it would be easier in a syntax which includes the intersection of two regular expressions (Ragel provides that operator, for example), but in this case a simple enumeration of possibilities suffices.在包含两个正则表达式的交集的语法中会更容易（例如，Ragel 提供了该运算符），但在这种情况下，简单的可能性枚举就足够了。

If you really want to limit the matches to the canonical representations listed in RFC 5952 §4.2 , then you have a slightly more daunting task because of the requirement that the compressed run of 0s must be the longest run of 0s in the uncompressed address, or the first such run if there is more than one longest run of the same length.如果您真的想将匹配限制为RFC 5952 §4.2中列出的规范表示，那么您有一个稍微艰巨的任务，因为要求 0 的压缩运行必须是未压缩地址中 0 的最长运行，或者如果有多个相同长度的最长运行，则第一次运行。

That would be possible by making a much longer enumeration of permissible forms where the compressed run satisfies the "first longest" constraint.这可以通过对允许的 forms 进行更长的枚举来实现，其中压缩运行满足“第一最长”约束。 But I'm really not sure that there is any value in creating that monster, since RFC 5952 is quite clear that the intent is to restrict the set of representations produced by a conforming application (emphasis added):但我真的不确定创建那个怪物有什么价值，因为 RFC 5952 非常清楚其意图是限制由符合要求的应用程序生成的表示集（强调添加）：

…[A]ll implementations MUST accept and be able to handle any legitimate RFC4291 format. …[A]所有实现必须接受并能够处理任何合法的 RFC4291格式。

Since regular expressions are mostly of use in recognising and parsing inputs, it seems unnecessary to go to the trouble of writing and verifying the list of possible canonical patterns.由于正则表达式主要用于识别和解析输入，go 似乎没有必要编写和验证可能的规范模式列表。

An IPv6 address conforming to clause 1 of RFC 4291 §2.2 can easily be described in lex syntax:符合RFC 4291 §2.2第 1 条的 IPv6 地址可以很容易地用 lex 语法描述：

piece      [[:xdigit:]]{1,4}
%%
{piece}(:{piece}){7}      { /* an uncompressed IPv6 address */ }

In passing, although it seems unnecessary for the same reasons noted above, it's very simple to restrict {piece} to the canonical 16-bit representations (lower-case only, no leading zeros):顺便说一句，尽管出于上述相同的原因似乎没有必要，但将{piece}限制为规范的 16 位表示非常简单（仅小写，没有前导零）：

piece    0|[1-9a-f][0-9a-f]{0,3}

The complication comes with the requirement in clause 2 that only one run of 0s be compressed.复杂性来自第 2 节中的要求，即仅压缩一次运行的 0。 It's easy to write a regular expression which allows only one number to be omitted:编写一个只允许省略一个数字的正则表达式很容易：

(({piece}:)*{piece})?::({piece}(:{piece})*)?

but that formulation no longer limits the number of pieces to 8. It's also fairly easy to write a regular expression which allows omitted pieces, limiting the number of fields:但是该公式不再将片段数限制为 8。编写允许省略片段的正则表达式也相当容易，从而限制了字段的数量：

{piece}(:{piece}?){1,6}:{piece}|:(:{piece}){1,7}|({piece}:){1,7}:|::

What's desired is the intersection of those two patterns, plus the pattern for uncompressed addresses.需要的是这两种模式的交集，加上未压缩地址的模式。 But, as mentioned, there's no way of writing intersections in (f)lex.但是，如前所述，没有办法在 (f)lex 中编写交集。 So we end up enumerating possibilities.所以我们最终列举了各种可能性。 A simple enumeration is the number of initial uncompressed pieces:一个简单的枚举是初始未压缩片段的数量：

(?x:  /* Flex's extended syntax allows whitespace and continuation lines */
    {piece}(:{piece}){7}
  | {piece}             ::{piece}(:{piece}){0,5}
  | {piece}:{piece}     ::{piece}(:{piece}){0,4}
  | {piece}(:{piece}){2}::{piece}(:{piece}){0,3}
  | {piece}(:{piece}){3}::{piece}(:{piece}){0,2}
  | {piece}(:{piece}){4}::{piece}(:{piece})?
  | {piece}(:{piece}){5}::{piece}
  | {piece}(:{piece}){0,6}::
  | ::{piece}(:{piece}){0,6}
  | ::
)

That still excludes the various forms of embedding IPv4 addresses in IPv6, but it should be clear how to add those, if desired.这仍然不包括在 IPv6 中嵌入 IPv4 地址的各种 forms，但如果需要，应该清楚如何添加这些地址。

用于匹配 IP 版本 6 地址的压缩文本表示的 LEX 模式

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-10-17 23:37:41

用于匹配 IP 版本 6 地址的压缩文本表示的 LEX 模式

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-10-17 23:37:41

解决方案1
1 已采纳 2019-10-17 23:37:41