There is a text like this (many lines)
1. sdfsdf werwe werwemax45 rwrwerwr
2. 34348878 max max44444445666 sdf
3. 4353424 23423eedf max55 dfdg dfgdf
4. max45
5. 4324234234sdfsdf maxx34534
Using regular expressions I need to find all lines and include a word max<digits>
(containing digits instead of literally <digits>
) into a matching group.
So I've tried this regular expression:
^.*?\b(max\d+)\b.*?$
But it finds only lines containing max...
and ignores others.
Then I've tried
^.*?\b(max\d+)?\b.*?$
It finds all lines but without matching group containing max...
.
The issue can be "debugged" with a slightly modified pattern, ^(.*?)\\b(max\\d+)?\\b(.*?)$
, with the rest of the pattern wrapped into separate capturing groups. You can see that the lines are all matched by the Group 3 pattern, the last .*?
. It happens because the first .*?
is skipped (since it is a lazy pattern), then (max\\d=)?
matches an empty string at the start of the line (none begins with max
+ digits - but if any line starts with that pattern, you would get it captured ), and the last .*?
captures the whole line.
You can fix it by wrapping the first part into a non-capturing optional group capturing the max\\d+
into an obligatory capturing group
^(?:.*?\b(max\d+)\b)?.*?$
Or even without ?$
at the end since .*
will match greedily up to the end of the line:
^(?:.*?\b(max\d+)\b)?.*
See the regex demo
Details
^
- start of string (with m
option, start of a line) (?:.*?\\b(max\\d+)\\b)?
- an optional non-capturing group:
.*?
- any 0+ chars, other than line break chars as few as possible \\b
- a word boundary (max\\d+)
- Group 1 (obligatory, will be tried once): max
and 1+ digits \\b
- a word boundary .*
- rest of the line
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.