Regex - stop after first match with version numbers

Question

I have a regex that tries to match version numbers, however it is generating a lot of false positives.

(\d{1,3}).*?(\d{1,3}).*?(\d{1,3})

Is what I have so far, and this matches anything with 3 parts and 2 dots. 1.2.333 11.2.3

However it doesn't match things with 2 parts, and 1 dot, 1.2

It is also over greedy, so a line with multiple dots and parts, eg 11.22.33 . 44.55.66 .77 it will match twice.

Im looking for a regex that will cover all scenarios,

1.2 1.2.3 and only match first instance of 1.2.3.4.5.6.7.8

Pythex checker

EDIT: I think ^\d{1,3}(?:\.\d{1,3})(?:\.\d{1,3})? will be as close as I can get it to cover the bulk of what I want so far It still doesn't pick out the first 3 parts of a long list tho, I'll keep trying

Answer 1

The regex you want is:

(\d{1,3}(?:\.\d{1,3}){1,2})(?:\.\d{1,3})*

The final subexpression, (?:\.\d{1,3})* is included to consume the rest of the input that would otherwise result in being matched when the findall scan resumed if this subexpression were not included, as in the case of 1.2.3 1.2.3.4.5.6.7.8 .

See Regex Demo

import re

s = 'abc 1.2 1.2.3 1.2.3.4.5.6.7.8'

print(re.findall(r'(\d{1,3}(?:\.\d{1,3}){1,2})(?:\.\d{1,3})*', s))

Prints:

['1.2', '1.2.3', '1.2.3']

Alternatively, you can use a negative lookbehind:

((?<!\.)\d{1,3}(?:\.\d{1,3}){1,2})

See Regex Demo

import re

s = 'abc 1.2 1.2.3 1.2.3.4.5.6.7.8'
print(re.findall(r'((?<!\.)\d{1,3}(?:\.\d{1,3}){1,2})', s))

Prints:

['1.2', '1.2.3', '1.2.3']

If you are using search instead of findall , the match is returned as Group 1.

Answer 2

Use

(?m)^.*?\b(\d{1,3}\.\d{1,3}(?:\.\d{1,3})?)\b

See proof .

Python code:

re.findall(r'(?m)^.*?\b(\d{1,3}\.\d{1,3}(?:\.\d{1,3})?)\b', string)

Explanation

--------------------------------------------------------------------------------
  (?m)                     set flags for this block (with ^ and $
                           matching start and end of line) (case-
                           sensitive) (with . not matching \n)
                           (matching whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a "line"
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d{1,3}                  digits (0-9) (between 1 and 3 times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \d{1,3}                  digits (0-9) (between 1 and 3 times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      \.                       '.'
--------------------------------------------------------------------------------
      \d{1,3}                  digits (0-9) (between 1 and 3 times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

Regex - stop after first match with version numbers

Question

2 answers

solution1
1 2021-01-14 17:05:02

solution2
1 ACCPTED 2021-01-14 23:15:39

Regex - stop after first match with version numbers

Question

2 answers

solution1 1 2021-01-14 17:05:02

solution2 1 ACCPTED 2021-01-14 23:15:39

solution1
1 2021-01-14 17:05:02

solution2
1 ACCPTED 2021-01-14 23:15:39