简体   繁体   English

正则表达式在相同的输入上返回不同的结果

[英]Regex returning different result on same input

I am checking for some results and passing two inputs from a document, two inputs looks same but why are they returning different output. 我正在检查一些结果,并从文档中传递两个输入,两个输入看起来相同,但是为什么它们返回不同的输出。 my regex is 我的正则表达式是

(?<preandconjunct>(?:\b([Ss]ubsection|[Ss]ection|[Aa]rticle) +)(?<conjunct>(?:(?<level>(?:(?:[IVXivx]{1,5}(?![A-Z]))|(?:[A-Z]{1,2}(?![A-Z]))|(?:[0-9]+)))|(?<level>\((?:(?:[IVXivx]{1,5}(?![A-Z]))|(?:[A-Z]{1,2}(?![A-Z]))|(?:(?!in|or|if|of|to|as|at|it|no|an)[a-z]{1,2}(?![a-z]))|(?:[0-9]+))\))|(?<level>[\.-](?:(?:[IVXivx]{1,5}(?![A-Z]))|(?:[A-Z]{1,2}(?![A-Z]))|(?:[0-9]+))))+)(?=$|[ ,;.)]))

and the two inputs are 两个输入是

a dispute under Section 3.1 (which shall be governed exclusively by Section 3.1) or as set forth in Section 11.3(b), the indemnification provisions of this Article XI and Article XII are the sole and exclusive remedies of the Parties pursuant to this Agreement or in connection with the transactions contemplated hereby. 根据第3.1条(应仅受第3.1条管辖)或第11.3(b)条所规定的争议,根据本协议,第XI条和第XII条的赔偿规定是双方的唯一和专有补救措施,或与特此考虑的交易有关。 From and after the Closing, to the maximum extent permitted by Law, except with respect to claims based on intentional fraud, a dispute under Section 3.1 (which shall be governed exclusively by Section 3.1), 结案前后,在法律允许的最大范围内,除了基于故意欺诈的索赔外,第3.1条下的争议(应仅受第3.1条管辖),

and

a dispute under Section 3.1 (which shall be governed exclusively by Section 3.1) or as set forth in Section 11.3(b), the indemnification provisions of this Article XI and Article XII are the sole and exclusive remedies of the Parties pursuant to this Agreement or in connection with the transactions contemplated hereby. 根据第3.1条(应仅受第3.1条管辖)或第11.3(b)条所规定的争议,根据本协议,第XI条和第XII条的赔偿规定是双方的唯一和专有补救措施,或与特此考虑的交易有关。 From and after the Closing, to the maximum extent permitted by Law, except with respect to claims based on intentional fraud, a dispute under Section 3.1 (which shall be governed exclusively by Section 3.1), 结案前后,在法律允许的最大范围内,除了基于故意欺诈的索赔外,第3.1条下的争议(应仅受第3.1条管辖),

also i am expecting 我也期待

Section 3.1 第3.1节

Section 3.1 第3.1节

Section 11.3(b) 第11.3(b)条

Article XI 第十一条

Article XII 第十二条

Section 3.1 第3.1节

Section 3.1 第3.1节

last one is not showing up in first input. 最后一个没有出现在第一个输入中。

last one is not showing up in first input 最后一个没有出现在第一个输入中

One way of getting all characters is to consume up to an anchor character. 获取所有字符的一种方法是消耗最多一个锚字符。 For example take the text blah blah Section 3.1 (governed by Section 3.1) . 例如,以文本blah blah Section 3.1 (governed by Section 3.1)为例。 We have three anchors, Section , ( and ) . 我们有三个锚点Section() Let us create a pattern based off of those literal anchors. 让我们基于这些文字锚创建一个模式。

I now will comment my regex pattern which needs IgnorePattnerWhiteSpace to work properly in the regex parser btw. 我现在将评论我的正则表达式模式 ,该模式需要IgnorePattnerWhiteSpace在正则表达式解析器中正常运行。

(((Sub)?Section)|Article)\s+    # Anchor of Section or Article or Subsection
(?<Number>[^\s]+)               # Number involved
\s+
   \(                           # Anchor of '('
      (?<Conjuct>[^)]+)         # Consume til next anchor
   \)                           # ')' anchor.

By using the Not set [^ ] we can consume any funky characters which are not an ending ) anchor. 通过使用设置[^ ]我们可以消耗任何时髦的字符,这不是一个结束)锚。 Our match result looks like this 我们的比赛结果看起来像这样

在此处输入图片说明

why are they returning different output. 他们为什么返回不同的输出。

You need to make the capturing less complex by maybe looking at the literal anchors as mentioned. 您可能需要通过查看上述文字锚点来简化捕获过程。 Maybe even do a two pass regex by first creating general tokens of the text and upon the second regex pass, extract specific items from the tokens? 也许甚至通过首先创建文本的常规标记来进行两次通过的正则表达式,然后在第二次通过正则表达式时,从标记中提取特定项?

Also comment your pattern as I have done and work on individual pieces of it, once the individual items are working bring the whole pattern together. 就像我已经完成的工作一样,还要对您的模式进行评论,并在其中的各个部分上进行工作,一旦各个项目都可以将整个模式整合在一起。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM