简体   繁体   中英

Regex to match all words that startswith and endswith specific characters in string

How can I fix my regex pattern to match every word that startswith "X" and endswith "Z"?

Code:

import re

#input
s = "xaz xazx xaxsza zsxdaszdx zasxz xaaz xaaaz"

pattern1 = "x.*z"
pattern2 = "\bx.*z\b"
pattern3 = "x.*?z"
pattern4 = "\b^x.*z$\b"
pattern5 = "\Bx.*z\B"
#also tried using \s, \S, ^ and $... 

re.findall(pattern, s)

Desired Output:

out = ["xaz", "xaaz", "xaaaz"] 

How can I achieve this?

Regex Demo

A couple of notes on your patterns:

  • "x.*z" - matches x , then *any chars other than line break as many as possible up to the last occurrence of z
  • "\\bx.*z\\b" - a backspace symbol, then the same as above, and again a backspace symbol
  • "x.*?z" - an x , then *any chars other than line break as few as possible up to the first occurrence of z
  • "\\b^x.*z$\\b" - a backspace symbol followed with the start of the string, which is already signalling a failure, any 0+ chars up to the z followed with the end of string, and then a backspace symbol
  • "\\Bx.*z\\B" - a non-word boundary, x , any 0+ chars up to the last z that is not followed with a word boundary.

You need to use a raw string literal so that \\b could denote a word boundary.

So, you may use

s = "xaz xazx xaxsza zsxdaszdx zasxz xaaz xaaaz"
pattern = r"\bx\w*z\b"
print(re.findall(pattern, s))
# => ['xaz', 'xaaz', 'xaaaz']

See the Python demo

If you want to match words with letters only, use r"\\bx[^\\W\\d_]*z\\b" .

Pattern demo :

  • \\b - a leading word boundary
  • x - an x
  • \\w* - 0+ word chars (letters/digits/ _ ) (the [^\\W\\d_] construct will match any letter, digits and _ are substracted in the "double negative" construction)
  • z a z
  • \\b - a trailing word boundary.

Note that in case you only have "words" separated with spaces, you may get the results with

[x for x in s.split() if x.startswith('x') and x.endswith('z')]

See another demo

Regex: \\bx\\S+z\\b

Demo: https://regex101.com/r/XuJybA/2

  1. Search for words using the word boundary: \\b
  2. See that the string begins with x
  3. Then match anything except spaces \\S
  4. And make sure the word ends with z

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM