简体   繁体   中英

How to Limit Substring size in python using regular expression

I am trying to get all the substring starting with character 'm' and having 5 character. I tried with this code, but its not working.

<code>
import re
str1 = "mouseeee mother mouse is beautiful creation"
r = re.compile("m[a-z]{5}$")
print(r.findall(str1))</code>

To extract words starting with small m and having 5 character in them, use

import re
str1 = "mouseeee mother mouse is beautiful creation"
r = re.compile(r"\bm[a-z]{5}\b")
print(r.findall(str1)) # => ['mother']

See the Python demo . mouseeee has more than 6 letters and mouse has got 4 letters after the initial m , so those are not matched.

Pattern details :

  • \\b - word boundary
  • m - an m
  • [az]{5} - 5 ASCII lowercase letters
  • \\b - a word boundary.

To make the pattern case insensitive, pass re.I flag to the re.compile .

Edit: added suggestions by Wiktor Stribiżew

If you want to get all separate words of exactly length 6 starting with the letter m, you could use:

r = re.compile(r"(?<!\w)(m[a-z]{5})(?!\w)")

This ensures a non-letter-char before and after the match (with negative lookback and lookahead), which consists of the letter m followed by 5 other letters. The negative lookahead can be simplified by using \\b for word boundaries, as presented in the other answers.

>>> import re
>>> str1 = "mouseeee mother mouse is beautiful creation"
>>> r = re.compile("(?<= )(m[a-z]{5})(?= )")
>>> print(r.findall(str1))
['mother']

You probably want the regex \\bm[az]{5}\\b (\\b is the word boundary escape sequence)

Currently, in your regex $ denotes the end of the string. In addition, there's nothing in there to prevent the matching from starting in the middle of a word.

>>> str1 = "mouseeee mother mouse is beautiful creation"
>>> r = re.compile(r"\bm[a-z]{5}\b")
>>> r.findall(str1)
['mother']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM