使正则表达式不捕获OR捕获组

Question

So, I am struggling to capture what kind of "language" snippet the string contains: 因此，我正在努力捕获字符串包含什么样的“语言”片段：

The language snippet are inside () and are combination of: En,Fr,De,Es,It 语言片段位于() ，并且组合为： En,Fr,De,Es,It

Example: 例：

File (En,Fr,De,Es,It).doc    <== should match all 5 languages
File (En,Fr) (Required).doc  <== should match `En` and `Fr`
File (Enfoo,Fr).doc          <== should match only `Fr`
File (E,Fr).doc              <== should match only `Fr`

My current regex: 我当前的正则表达式：

((\\(|,)En(\\)|,))|((\\(|,)Fr(\\)|,))|((\\(|,)De(\\)|,))|((\\(|,)Es(\\)|,))|((\\(|,)It(\\)|,))

What does it mean: 这是什么意思：

((\(|,)  <== either starts with `open parenthesis` or `comma`  (1)
En       <== the language                                      (2)
(\)|,))  <== either ends with `close parenthesis` or `comma`   (3)

then I just append with regex OR (|) 然后我只添加正则表达式OR （|）

The problem as you can see: regexr.com/3ev6p is that if there is a second language snippet ie Fr it won't satisfy the regex (1) because the first language snippet En is capturing/occupying the open parenthesis or comma already, resulting for the 2nd language snippet Fr to be not matched... 如您所见： regexr.com/3ev6p的问题是，如果存在第二个语言代码段，即Fr ，它将不满足正则表达式(1)因为第一个语言代码段En已经在捕获/占用open parenthesis或comma ，导致第二语言片段Fr不匹配...

Do you guys know how to handle completely capture all the language snippet? 你们知道如何处理完全捕获所有语言片段吗？ I am planning to use PHP's preg_match_all() to get all these. 我打算使用PHP的preg_match_all()来获取所有这些信息。 Hope somebody can help. 希望有人能帮忙。 Thank you! 谢谢！

Answer 1

The regex you have consumes the commas around the language codes. 您拥有的正则表达式消耗了语言代码周围的逗号。 That mean, after finding a match, the index is after a comma, and since there cannot be a match, the language after that comma is skipped by the regex engine. 这意味着，找到匹配项后，索引将在逗号后，并且由于无法匹配，因此正则表达式引擎将跳过该逗号后的语言。

In order to match such overlapping matches lookarounds can be used: 为了匹配这样的重叠匹配，可以使用环顾四周：

(?<=[(,])(En|Fr|De|Es|It)(?=[,)])
^^^^^^^^^                ^^^^^^^^

See this regex demo . 请参阅此正则表达式演示。

The (?<=[(,]) is a positive lookbehind that requires a , or ( before the language code, and (?=[,)]) is a positive lookahead that requires a comma or ) to the right of the language code, but the comma/parenthesis is not consumed, it remains to be matched during the next iteration. 的(?<=[(,])是正回顾后，需要一个,或(语言代码，和之前(?=[,)])是一个正向前查找需要逗号或)该语言的右侧代码，但不使用逗号/括号，在下一次迭代期间仍需将其匹配。

Another solution that is possible here is the use of word boundaries (as is already described in the comments). 此处可能的另一种解决方案是使用单词边界（如注释中所述）。 Word boundaries help match whole words. 单词边界有助于匹配整个单词。

\b(En|Fr|De|Es|It)\b

See the regex demo 见正则表达式演示

Answer 2

This should match all: 这应该符合所有条件：

 (?<=,|\()(\w\w)(?=,|\))

Accompanied by preg_match_all should do the job. 伴随有preg_match_all应该可以完成这项工作。

Explained: 解释：

A lookbehind assertion (should be preceded by "," or "(") 后置断言（应在“，”或“（”之后）
Two word characters (So you don't have to specify which languages you are targeting beforehand). 两个字字符（因此您不必事先指定要定位的语言）。
A look ahead assertion (should be followed by "," or ")") 前瞻性断言（后跟“，”或“）”）

And thats it. 就是这样。 :) :)

Working version . 工作版本。

Regards. 问候。

使正则表达式不捕获OR捕获组

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-12-27 19:29:37

解决方案2
1 2016-12-27 19:41:13

使正则表达式不捕获OR捕获组

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-12-27 19:29:37

解决方案2 1 2016-12-27 19:41:13

解决方案1
3 已采纳 2016-12-27 19:29:37

解决方案2
1 2016-12-27 19:41:13