[英]regex match a word after a certain character
I would like to match a word when it is after a char m
or b
我想在字符m
或b
之后匹配一个单词
So for example, when the word is men
, I would like to return en
(only the word that is following m
), if the word is beetles
then return eetles
因此,例如,当单词是men
时,我想返回en
(仅跟随m
的单词),如果单词是beetles
则返回eetles
Initially I tried (m|b)\w+
but it matches the entire men
not en
最初我尝试过(m|b)\w+
但它与整个men
不en
How do I write regex expression in this case?在这种情况下如何编写正则表达式? Thank you!谢谢!
You could get the match only using a positive lookbehind asserting what is on the left is either m or b using character class [mb]
preceded by a word boundary \b
您只能使用肯定的后视来获得匹配,断言左侧是 m 或 b 使用字符 class [mb]
前面是单词边界\b
(?<=\b[mb])\w+
(?<=
Positive lookbehind, assert what is directly to the left is (?<=
Positive lookbehind, assert 左边的内容是\b[mb]
Word boundary, match either m
or b
\b[mb]
字边界,匹配m
或b
)
Close lookbehind )
近距离观察\w+
Match 1 + word chars \w+
匹配 1 + 单词字符If there can not be anything after the the word characters, you can assert a whitespace boundary at the right using (?!\S)
如果单词字符后面没有任何内容,您可以使用(?!\S)
在右侧断言空白边界
(?<=\b[mb])\w+(?!\S)
Regex demo |正则表达式演示| Python demo Python 演示
Example code示例代码
import re
test_str = ("beetles men")
regex = r"(?<=\b[mb])\w+"
print(re.findall(regex, test_str))
Output Output
['eetles', 'en']
You may use您可以使用
\b[mb](\w+)
See the regex demo .请参阅正则表达式演示。
NOTE : When your known prefixes include multicharacter sequences , say, you want to find words starting with m
or be
, you will have to use a non-capturing group rather than a character class: \b(?:m|be)(\w+)
.注意:当您的已知前缀包括多字符序列时,例如,您想查找以m
或be
开头的单词,您将不得不使用非捕获组而不是字符 class: \b(?:m|be)(\w+)
。 The current solution can thus be written as \b(?:m|b)(\w+)
(however, a character class here looks more natural, unless you have to build the regex dynamically).因此,当前的解决方案可以写为\b(?:m|b)(\w+)
(但是,这里的字符 class 看起来更自然,除非您必须动态构建正则表达式)。
Details细节
\b
- a word boundary \b
- 单词边界[mb]
- m
or b
[mb]
- m
或b
(\w+)
- Capturing group 1: any one or more word chars, letters, digits or underscores. (\w+)
- 捕获组 1:任何一个或多个单词字符、字母、数字或下划线。 To match only letters, use ([^\W\d_]+)
instead.要仅匹配字母,请改用([^\W\d_]+)
。import re
rx = re.compile(r'\b[mb](\w+)')
text = "The words are men and beetles."
# First occurrence:
m = rx.search(text)
if m:
print(m.group(1)) # => en
# All occurrences
print( rx.findall(text) ) # => ['en', 'eetles']
(?<=[mb])\w+/
You can use this above regex.您可以使用上面的正则表达式。 The regex means "Any word starts with m or b".正则表达式的意思是“任何以 m 或 b 开头的单词”。
(?<=[mb])
: positive lookbehind (?<=[mb])
:积极的向后看\w+
: matches any word character (equal to [a-zA-Z0-9]+) \w+
:匹配任何单词字符(等于 [a-zA-Z0-9]+)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.