I am trying to get all the substring starting with character 'm' and having 5 character. I tried with this code, but its not working.
<code>
import re
str1 = "mouseeee mother mouse is beautiful creation"
r = re.compile("m[a-z]{5}$")
print(r.findall(str1))</code>
To extract words starting with small m
and having 5 character in them, use
import re
str1 = "mouseeee mother mouse is beautiful creation"
r = re.compile(r"\bm[a-z]{5}\b")
print(r.findall(str1)) # => ['mother']
See the Python demo . mouseeee
has more than 6 letters and mouse
has got 4 letters after the initial m
, so those are not matched.
Pattern details :
\\b
- word boundary m
- an m
[az]{5}
- 5 ASCII lowercase letters \\b
- a word boundary. To make the pattern case insensitive, pass re.I
flag to the re.compile
.
Edit: added suggestions by Wiktor Stribiżew
If you want to get all separate words of exactly length 6 starting with the letter m, you could use:
r = re.compile(r"(?<!\w)(m[a-z]{5})(?!\w)")
This ensures a non-letter-char before and after the match (with negative lookback and lookahead), which consists of the letter m
followed by 5 other letters. The negative lookahead can be simplified by using \\b
for word boundaries, as presented in the other answers.
>>> import re
>>> str1 = "mouseeee mother mouse is beautiful creation"
>>> r = re.compile("(?<= )(m[a-z]{5})(?= )")
>>> print(r.findall(str1))
['mother']
You probably want the regex \\bm[az]{5}\\b
(\\b is the word boundary escape sequence)
Currently, in your regex $ denotes the end of the string. In addition, there's nothing in there to prevent the matching from starting in the middle of a word.
>>> str1 = "mouseeee mother mouse is beautiful creation"
>>> r = re.compile(r"\bm[a-z]{5}\b")
>>> r.findall(str1)
['mother']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.