[英]make regex not capture the OR capture group
So, I am struggling to capture what kind of "language" snippet the string contains: 因此,我正在努力捕获字符串包含什么样的“语言”片段:
The language snippet are inside ()
and are combination of: En,Fr,De,Es,It
语言片段位于
()
,并且组合为: En,Fr,De,Es,It
Example: 例:
File (En,Fr,De,Es,It).doc <== should match all 5 languages
File (En,Fr) (Required).doc <== should match `En` and `Fr`
File (Enfoo,Fr).doc <== should match only `Fr`
File (E,Fr).doc <== should match only `Fr`
My current regex: 我当前的正则表达式:
((\\(|,)En(\\)|,))|((\\(|,)Fr(\\)|,))|((\\(|,)De(\\)|,))|((\\(|,)Es(\\)|,))|((\\(|,)It(\\)|,))
What does it mean: 这是什么意思:
((\(|,) <== either starts with `open parenthesis` or `comma` (1)
En <== the language (2)
(\)|,)) <== either ends with `close parenthesis` or `comma` (3)
then I just append with regex OR
(|) 然后我只添加正则表达式
OR
(|)
The problem as you can see: regexr.com/3ev6p is that if there is a second language snippet ie Fr
it won't satisfy the regex (1)
because the first language snippet En
is capturing/occupying the open parenthesis
or comma
already, resulting for the 2nd language snippet Fr
to be not matched... 如您所见: regexr.com/3ev6p的问题是,如果存在第二个语言代码段,即
Fr
,它将不满足正则表达式(1)
因为第一个语言代码段En
已经在捕获/占用open parenthesis
或comma
,导致第二语言片段Fr
不匹配...
Do you guys know how to handle completely capture all the language snippet? 你们知道如何处理完全捕获所有语言片段吗? I am planning to use PHP's
preg_match_all()
to get all these. 我打算使用PHP的
preg_match_all()
来获取所有这些信息。 Hope somebody can help. 希望有人能帮忙。 Thank you!
谢谢!
The regex you have consumes the commas around the language codes. 您拥有的正则表达式消耗了语言代码周围的逗号。 That mean, after finding a match, the index is after a comma, and since there cannot be a match, the language after that comma is skipped by the regex engine.
这意味着,找到匹配项后,索引将在逗号后,并且由于无法匹配,因此正则表达式引擎将跳过该逗号后的语言。
In order to match such overlapping matches lookarounds can be used: 为了匹配这样的重叠匹配,可以使用环顾四周:
(?<=[(,])(En|Fr|De|Es|It)(?=[,)])
^^^^^^^^^ ^^^^^^^^
See this regex demo . 请参阅此正则表达式演示 。
The (?<=[(,])
is a positive lookbehind that requires a ,
or (
before the language code, and (?=[,)])
is a positive lookahead that requires a comma or )
to the right of the language code, but the comma/parenthesis is not consumed, it remains to be matched during the next iteration. 的
(?<=[(,])
是正回顾后,需要一个,
或(
语言代码,和之前(?=[,)])
是一个正向前查找需要逗号或)
该语言的右侧代码,但不使用逗号/括号,在下一次迭代期间仍需将其匹配。
Another solution that is possible here is the use of word boundaries (as is already described in the comments). 此处可能的另一种解决方案是使用单词边界(如注释中所述)。 Word boundaries help match whole words.
单词边界有助于匹配整个单词。
\b(En|Fr|De|Es|It)\b
See the regex demo 见正则表达式演示
This should match all: 这应该符合所有条件:
(?<=,|\()(\w\w)(?=,|\))
Accompanied by preg_match_all
should do the job. 伴随有
preg_match_all
应该可以完成这项工作。
Explained: 解释:
And thats it. 就是这样。 :)
:)
Working version . 工作版本 。
Regards. 问候。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.