So, I am struggling to capture what kind of "language" snippet the string contains:
The language snippet are inside ()
and are combination of: En,Fr,De,Es,It
Example:
File (En,Fr,De,Es,It).doc <== should match all 5 languages
File (En,Fr) (Required).doc <== should match `En` and `Fr`
File (Enfoo,Fr).doc <== should match only `Fr`
File (E,Fr).doc <== should match only `Fr`
My current regex:
((\\(|,)En(\\)|,))|((\\(|,)Fr(\\)|,))|((\\(|,)De(\\)|,))|((\\(|,)Es(\\)|,))|((\\(|,)It(\\)|,))
What does it mean:
((\(|,) <== either starts with `open parenthesis` or `comma` (1)
En <== the language (2)
(\)|,)) <== either ends with `close parenthesis` or `comma` (3)
then I just append with regex OR
(|)
The problem as you can see: regexr.com/3ev6p is that if there is a second language snippet ie Fr
it won't satisfy the regex (1)
because the first language snippet En
is capturing/occupying the open parenthesis
or comma
already, resulting for the 2nd language snippet Fr
to be not matched...
Do you guys know how to handle completely capture all the language snippet? I am planning to use PHP's preg_match_all()
to get all these. Hope somebody can help. Thank you!
The regex you have consumes the commas around the language codes. That mean, after finding a match, the index is after a comma, and since there cannot be a match, the language after that comma is skipped by the regex engine.
In order to match such overlapping matches lookarounds can be used:
(?<=[(,])(En|Fr|De|Es|It)(?=[,)])
^^^^^^^^^ ^^^^^^^^
See this regex demo .
The (?<=[(,])
is a positive lookbehind that requires a ,
or (
before the language code, and (?=[,)])
is a positive lookahead that requires a comma or )
to the right of the language code, but the comma/parenthesis is not consumed, it remains to be matched during the next iteration.
Another solution that is possible here is the use of word boundaries (as is already described in the comments). Word boundaries help match whole words.
\b(En|Fr|De|Es|It)\b
See the regex demo
This should match all:
(?<=,|\()(\w\w)(?=,|\))
Accompanied by preg_match_all
should do the job.
Explained:
And thats it. :)
Regards.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.