How can I fix my regex pattern to match every word that startswith "X" and endswith "Z"?
Code:
import re
#input
s = "xaz xazx xaxsza zsxdaszdx zasxz xaaz xaaaz"
pattern1 = "x.*z"
pattern2 = "\bx.*z\b"
pattern3 = "x.*?z"
pattern4 = "\b^x.*z$\b"
pattern5 = "\Bx.*z\B"
#also tried using \s, \S, ^ and $...
re.findall(pattern, s)
Desired Output:
out = ["xaz", "xaaz", "xaaaz"]
How can I achieve this?
A couple of notes on your patterns:
"x.*z"
- matches x
, then *any chars other than line break as many as possible up to the last occurrence of z
"\\bx.*z\\b"
- a backspace symbol, then the same as above, and again a backspace symbol "x.*?z"
- an x
, then *any chars other than line break as few as possible up to the first occurrence of z
"\\b^x.*z$\\b"
- a backspace symbol followed with the start of the string, which is already signalling a failure, any 0+ chars up to the z
followed with the end of string, and then a backspace symbol "\\Bx.*z\\B"
- a non-word boundary, x
, any 0+ chars up to the last z
that is not followed with a word boundary. You need to use a raw string literal so that \\b
could denote a word boundary.
So, you may use
s = "xaz xazx xaxsza zsxdaszdx zasxz xaaz xaaaz"
pattern = r"\bx\w*z\b"
print(re.findall(pattern, s))
# => ['xaz', 'xaaz', 'xaaaz']
See the Python demo
If you want to match words with letters only, use r"\\bx[^\\W\\d_]*z\\b"
.
Pattern demo :
\\b
- a leading word boundary x
- an x
\\w*
- 0+ word chars (letters/digits/ _
) (the [^\\W\\d_]
construct will match any letter, digits and _
are substracted in the "double negative" construction) z
a z
\\b
- a trailing word boundary. Note that in case you only have "words" separated with spaces, you may get the results with
[x for x in s.split() if x.startswith('x') and x.endswith('z')]
See another demo
Regex: \\bx\\S+z\\b
Demo: https://regex101.com/r/XuJybA/2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.