简体   繁体   English

正则表达式匹配某个字符后的单词

[英]regex match a word after a certain character

I would like to match a word when it is after a char m or b我想在字符mb之后匹配一个单词

So for example, when the word is men , I would like to return en (only the word that is following m ), if the word is beetles then return eetles因此,例如,当单词是men时,我想返回en (仅跟随m的单词),如果单词是beetles则返回eetles

Initially I tried (m|b)\w+ but it matches the entire men not en最初我尝试过(m|b)\w+但它与整个menen

How do I write regex expression in this case?在这种情况下如何编写正则表达式? Thank you!谢谢!

You could get the match only using a positive lookbehind asserting what is on the left is either m or b using character class [mb] preceded by a word boundary \b您只能使用肯定的后视来获得匹配,断言左侧是 m 或 b 使用字符 class [mb]前面是单词边界\b

(?<=\b[mb])\w+
  • (?<= Positive lookbehind, assert what is directly to the left is (?<= Positive lookbehind, assert 左边的内容是
  • \b[mb] Word boundary, match either m or b \b[mb]字边界,匹配mb
  • ) Close lookbehind )近距离观察
  • \w+ Match 1 + word chars \w+匹配 1 + 单词字符

Regex demo正则表达式演示

If there can not be anything after the the word characters, you can assert a whitespace boundary at the right using (?!\S)如果单词字符后面没有任何内容,您可以使用(?!\S)在右侧断言空白边界

(?<=\b[mb])\w+(?!\S)

Regex demo |正则表达式演示| Python demo Python 演示

Example code示例代码

import re

test_str = ("beetles men")
regex = r"(?<=\b[mb])\w+"
print(re.findall(regex, test_str))

Output Output

['eetles', 'en']

You may use您可以使用

\b[mb](\w+)

See the regex demo .请参阅正则表达式演示

NOTE : When your known prefixes include multicharacter sequences , say, you want to find words starting with m or be , you will have to use a non-capturing group rather than a character class: \b(?:m|be)(\w+) .注意:当您的已知前缀包括多字符序列时,例如,您想查找以mbe开头的单词,您将不得不使用非捕获组而不是字符 class: \b(?:m|be)(\w+) The current solution can thus be written as \b(?:m|b)(\w+) (however, a character class here looks more natural, unless you have to build the regex dynamically).因此,当前的解决方案可以写为\b(?:m|b)(\w+) (但是,这里的字符 class 看起来更自然,除非您必须动态构建正则表达式)。

Details细节

  • \b - a word boundary \b - 单词边界
  • [mb] - m or b [mb] - mb
  • (\w+) - Capturing group 1: any one or more word chars, letters, digits or underscores. (\w+) - 捕获组 1:任何一个或多个单词字符、字母、数字或下划线。 To match only letters, use ([^\W\d_]+) instead.要仅匹配字母,请改用([^\W\d_]+)

Python demo : Python 演示

import re
rx = re.compile(r'\b[mb](\w+)')
text = "The words are men and beetles."
# First occurrence:
m = rx.search(text)
if m:
    print(m.group(1))     # => en
# All occurrences
print( rx.findall(text) ) # => ['en', 'eetles']
(?<=[mb])\w+/

You can use this above regex.您可以使用上面的正则表达式。 The regex means "Any word starts with m or b".正则表达式的意思是“任何以 m 或 b 开头的单词”。

  1. (?<=[mb]) : positive lookbehind (?<=[mb]) :积极的向后看
  2. \w+ : matches any word character (equal to [a-zA-Z0-9]+) \w+ :匹配任何单词字符(等于 [a-zA-Z0-9]+)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM