简体   繁体   中英

How to match regex in python?

describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do

I try to filter the sg-ezsrzerzer out of it (so I want to filter on start sg- till double quote). I'm using python

I currently have:

import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-.*\b', a)
print(test)

output is

['sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do']

How do I only get ['sg-ezsrzerzer'] ?

The pattern (?<=group_id=\>").+?(?=\") would work nicely if the goal is to extract the group_id value within a given string formatted as in your example.

(?<=group_id=\>") Looks behind for the sub-string group_id=>" before the string to be matched.

.+? Matches one or more of any character lazily .

(?=\") Looks ahead for the character " following the match (effectively making the expression .+ match any character except a closing " ).

If you only want to extract sub-strings where the group_id starts with sg- then you can simply add this to the matching part of the pattern as follows (?<=group_id=\>")sg\-.+?(?=\")

import re

s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'

results = re.findall('(?<=group_id=\>").+?(?=\")', s)

print(results)

Output

['sg-ezsrzerzer']

Of course you could alternatively use re.search instead of re.findall to find the first instance of a sub-string matching the above pattern in a given string - depends on your use case I suppose.

import re

s = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'

result = re.search('(?<=group_id=\>").+?(?=\")', s)

if result:
    result = result.group()

print(result)

Output

'sg-ezsrzerzer'

If you decide to use re.search you will find that it returns None if there is no match found in your input string and an re.Match object if there is - hence the if statement and call to s.group() to extract the matching string if present in the above example.

The pattern \bsg-.*\b matches too much as the .* will match until the end of the string, and will then backtrack to the first word boundary, which is after the o and the end of string.


If you are using re.findall you can also use a capture group instead of lookarounds and the group value will be in the result.

:group_id=>"(sg-[^"\r\n]+)"

The pattern matches:

  • :group_id=>" Match literally
  • (sg-[^"\r\n]+) Capture group 1 match sg- and 1+ times any char except " or a newline
  • " Match the double quote

See a regex demo or a Python demo

For example

import re

pattern = r':group_id=>"(sg-[^"\r\n]+)"'
s = "describe aws_security_group({:group_id=>\"sg-ezsrzerzer\", :vpc_id=>\"vpc-zfds54zef4s\"}) do"

print(re.findall(pattern, s))

Output

['sg-ezsrzerzer']

Match until the first word boundary with \w+ :

import re
a = 'describe aws_security_group({:group_id=>"sg-ezsrzerzer", :vpc_id=>"vpc-zfds54zef4s"}) do'
test = re.findall(r'\bsg-\w+', a)
print(test[0])

See Python proof .

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  sg-                      'sg-'
--------------------------------------------------------------------------------
  \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                           more times (matching the most amount
                           possible))

Results : g-ezsrzerzer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM