简体   繁体   中英

Regex - stop after first match with version numbers

I have a regex that tries to match version numbers, however it is generating a lot of false positives.

(\d{1,3}).*?(\d{1,3}).*?(\d{1,3})

Is what I have so far, and this matches anything with 3 parts and 2 dots. 1.2.333 11.2.3

However it doesn't match things with 2 parts, and 1 dot, 1.2

It is also over greedy, so a line with multiple dots and parts, eg 11.22.33 . 44.55.66 .77 it will match twice.

Im looking for a regex that will cover all scenarios,

1.2 1.2.3 and only match first instance of 1.2.3.4.5.6.7.8

Pythex checker

EDIT: I think ^\d{1,3}(?:\.\d{1,3})(?:\.\d{1,3})? will be as close as I can get it to cover the bulk of what I want so far It still doesn't pick out the first 3 parts of a long list tho, I'll keep trying

The regex you want is:

(\d{1,3}(?:\.\d{1,3}){1,2})(?:\.\d{1,3})*

The final subexpression, (?:\.\d{1,3})* is included to consume the rest of the input that would otherwise result in being matched when the findall scan resumed if this subexpression were not included, as in the case of 1.2.3 1.2.3.4.5.6.7.8 .

See Regex Demo

import re

s = 'abc 1.2 1.2.3 1.2.3.4.5.6.7.8'

print(re.findall(r'(\d{1,3}(?:\.\d{1,3}){1,2})(?:\.\d{1,3})*', s))

Prints:

['1.2', '1.2.3', '1.2.3']

Alternatively, you can use a negative lookbehind:

((?<!\.)\d{1,3}(?:\.\d{1,3}){1,2})

See Regex Demo

import re

s = 'abc 1.2 1.2.3 1.2.3.4.5.6.7.8'
print(re.findall(r'((?<!\.)\d{1,3}(?:\.\d{1,3}){1,2})', s))

Prints:

['1.2', '1.2.3', '1.2.3']

If you are using search instead of findall , the match is returned as Group 1.

Use

(?m)^.*?\b(\d{1,3}\.\d{1,3}(?:\.\d{1,3})?)\b

See proof .

Python code:

re.findall(r'(?m)^.*?\b(\d{1,3}\.\d{1,3}(?:\.\d{1,3})?)\b', string)

Explanation

--------------------------------------------------------------------------------
  (?m)                     set flags for this block (with ^ and $
                           matching start and end of line) (case-
                           sensitive) (with . not matching \n)
                           (matching whitespace and # normally)
--------------------------------------------------------------------------------
  ^                        the beginning of a "line"
--------------------------------------------------------------------------------
  .*?                      any character except \n (0 or more times
                           (matching the least amount possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    \d{1,3}                  digits (0-9) (between 1 and 3 times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \.                       '.'
--------------------------------------------------------------------------------
    \d{1,3}                  digits (0-9) (between 1 and 3 times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
--------------------------------------------------------------------------------
      \.                       '.'
--------------------------------------------------------------------------------
      \d{1,3}                  digits (0-9) (between 1 and 3 times
                               (matching the most amount possible))
--------------------------------------------------------------------------------
    )?                       end of grouping
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM