简体   繁体   中英

Python regex to match multiple patterns in a given string

In the below string, I need the value of Version: Build Number: and perforce_url: Currently, I'm getting each of the matches listed above separately. I'd like to simplify my code to get the match in a single line.

x = '''Version: 2.2.4125
Build Number: 125
Project Name: xyz.master
Git Url: git+ssh://git@stash.xyz.com:123/ab/dashboard
Git Branch: origin/master
Git Built Data: qw123ed45rfgt689090gjlllb
perforce_url:
  //projects/f5/dashboard/1.3/xyz/portal/
artifacts:
   "..//www/":     www/ '''

I have used re.match to extract the value of Version: Build Number: and perforce_url: separately. However, I'd like to simplify and get it done in a single line.

import re
matchObj=re.match('Version:\s*(.*)\n', x)
if matchObj:
  print  matchObj.group(1)

matchObj=re.match('perforce_url:\s*(.*)\n', x)
if matchObj:
  print  matchObj.group(1)
matchObj=re.match('Build Number:\s*(.*)\n', x)
if matchObj:
  print  matchObj.group(1)

I tried the following pattern:

Version: \\s*(.*)\\n|perforce_url:\\s*(.*)\\n.

But it did NOT work. I want to create a list x and append the matches to list using

list = []
list.append()

Expected result :

['2.2.4125', '//projects/f5/dashboard/1.3/xyz/portal/' , '125']

Actual result

2.2.4125

//projects/f5/dashboard/1.3/xyz/portal/

125

You could put Version and Build Number after each other to get those values in a capturing group.

For the preforce_url you could use a repeating pattern using with a negative lookahead (?:\\n(?!perforce).*)* to match the lines as long as they don't start with perforce_url.

When is does, then match that using a capturing group:

Version:\s*(.*)\nBuild Number:\s*(.*)(?:\n(?!perforce).*)*\nperforce_url:\s*(.*)

Regex demo | Python demo

For example:

import re

regex = r"Version:\s*(.*)\nBuild Number:\s*(.*)(?:\n(?!perforce).*)*\nperforce_url:\s*(.*)"
x = ("Version: 2.2.4125\n"
            "Build Number: 125\n"
            "Project Name: xyz.master\n"
            "Git Url: git+ssh://git@stash.xyz.com:123/ab/dashboard\n"
            "Git Branch: origin/master\n"
            "Git Built Data: qw123ed45rfgt689090gjlllb\n"
            "perforce_url:\n"
            "  //projects/f5/dashboard/1.3/xyz/portal/\n"
            "artifacts:\n"
            "   \"..//www/\":     www/ ")

print(re.findall(regex, x))

Result

[('2.2.4125', '125', '//projects/f5/dashboard/1.3/xyz/portal/')]

Based off @The fourth bird answer but w/ a slight twist. By using non-capturing groups you can avoid having to have a non-capturing group between "Build Number" and "perforce". That way you only have regex for what you explicitly want to target.

r"Version:\s*(.*)\n|Build Number:\s*(.*)\n|perforce_url:\s*(.*)\n"

regex

Edit: realized non-capture groups around "Version", "Build" etc. were unnecessary

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM