I have a regex that tries to match version numbers, however it is generating a lot of false positives.
(\d{1,3}).*?(\d{1,3}).*?(\d{1,3})
Is what I have so far, and this matches anything with 3 parts and 2 dots. 1.2.333
11.2.3
However it doesn't match things with 2 parts, and 1 dot, 1.2
It is also over greedy, so a line with multiple dots and parts, eg 11.22.33 . 44.55.66 .77 it will match twice.
Im looking for a regex that will cover all scenarios,
1.2
1.2.3
and only match first instance of 1.2.3.4.5.6.7.8
EDIT: I think ^\d{1,3}(?:\.\d{1,3})(?:\.\d{1,3})?
will be as close as I can get it to cover the bulk of what I want so far It still doesn't pick out the first 3 parts of a long list tho, I'll keep trying
The regex you want is:
(\d{1,3}(?:\.\d{1,3}){1,2})(?:\.\d{1,3})*
The final subexpression, (?:\.\d{1,3})*
is included to consume the rest of the input that would otherwise result in being matched when the findall
scan resumed if this subexpression were not included, as in the case of 1.2.3 1.2.3.4.5.6.7.8
.
import re
s = 'abc 1.2 1.2.3 1.2.3.4.5.6.7.8'
print(re.findall(r'(\d{1,3}(?:\.\d{1,3}){1,2})(?:\.\d{1,3})*', s))
Prints:
['1.2', '1.2.3', '1.2.3']
Alternatively, you can use a negative lookbehind:
((?<!\.)\d{1,3}(?:\.\d{1,3}){1,2})
import re
s = 'abc 1.2 1.2.3 1.2.3.4.5.6.7.8'
print(re.findall(r'((?<!\.)\d{1,3}(?:\.\d{1,3}){1,2})', s))
Prints:
['1.2', '1.2.3', '1.2.3']
If you are using search
instead of findall
, the match is returned as Group 1.
Use
(?m)^.*?\b(\d{1,3}\.\d{1,3}(?:\.\d{1,3})?)\b
See proof .
Python code:
re.findall(r'(?m)^.*?\b(\d{1,3}\.\d{1,3}(?:\.\d{1,3})?)\b', string)
Explanation
--------------------------------------------------------------------------------
(?m) set flags for this block (with ^ and $
matching start and end of line) (case-
sensitive) (with . not matching \n)
(matching whitespace and # normally)
--------------------------------------------------------------------------------
^ the beginning of a "line"
--------------------------------------------------------------------------------
.*? any character except \n (0 or more times
(matching the least amount possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\d{1,3} digits (0-9) (between 1 and 3 times
(matching the most amount possible))
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d{1,3} digits (0-9) (between 1 and 3 times
(matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
\d{1,3} digits (0-9) (between 1 and 3 times
(matching the most amount possible))
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.