简体   繁体   中英

regex match a word after a certain character

I would like to match a word when it is after a char m or b

So for example, when the word is men , I would like to return en (only the word that is following m ), if the word is beetles then return eetles

Initially I tried (m|b)\w+ but it matches the entire men not en

How do I write regex expression in this case? Thank you!

You could get the match only using a positive lookbehind asserting what is on the left is either m or b using character class [mb] preceded by a word boundary \b

(?<=\b[mb])\w+
  • (?<= Positive lookbehind, assert what is directly to the left is
  • \b[mb] Word boundary, match either m or b
  • ) Close lookbehind
  • \w+ Match 1 + word chars

Regex demo

If there can not be anything after the the word characters, you can assert a whitespace boundary at the right using (?!\S)

(?<=\b[mb])\w+(?!\S)

Regex demo | Python demo

Example code

import re

test_str = ("beetles men")
regex = r"(?<=\b[mb])\w+"
print(re.findall(regex, test_str))

Output

['eetles', 'en']

You may use

\b[mb](\w+)

See the regex demo .

NOTE : When your known prefixes include multicharacter sequences , say, you want to find words starting with m or be , you will have to use a non-capturing group rather than a character class: \b(?:m|be)(\w+) . The current solution can thus be written as \b(?:m|b)(\w+) (however, a character class here looks more natural, unless you have to build the regex dynamically).

Details

  • \b - a word boundary
  • [mb] - m or b
  • (\w+) - Capturing group 1: any one or more word chars, letters, digits or underscores. To match only letters, use ([^\W\d_]+) instead.

Python demo :

import re
rx = re.compile(r'\b[mb](\w+)')
text = "The words are men and beetles."
# First occurrence:
m = rx.search(text)
if m:
    print(m.group(1))     # => en
# All occurrences
print( rx.findall(text) ) # => ['en', 'eetles']
(?<=[mb])\w+/

You can use this above regex. The regex means "Any word starts with m or b".

  1. (?<=[mb]) : positive lookbehind
  2. \w+ : matches any word character (equal to [a-zA-Z0-9]+)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM