正则表达式匹配句子开头、结尾和中间的有效单词

Question

I have a particular problem with regular expressions.我对正则表达式有一个特殊的问题。 Consider this sentence of valid words:考虑这句有效词：

sphere_a [sS]phere_b [sS]pher* [sS]pher* sph[eE]* sphere_a ^sphe* ^sp[hH]er*

I want these words to be split up, so I can use each one separately for operations downstream.我希望将这些词分开，以便我可以单独使用每个词进行下游操作。 To do this I am currently using 2 regular expressions.为此，我目前使用 2 个正则表达式。

One that matches the word at the start of the sentence:与句子开头的单词匹配的一个：

(?<=^)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)

and one that matches all the others:以及与所有其他匹配的一个：

(?<=\s)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)

It would be nice to know if this could fit in one expression?很高兴知道这是否适合一种表达方式？ It would save the looping.它会节省循环。

Strangely enough the obvious first try:奇怪的是，明显的第一次尝试：

(?<=^|\s)(?P<pattern>[\w\^\?\*\[\]]+)(?=\s|$)

fails with an error:失败并出现错误：

Invalid regular expression: look-behind requires fixed-width pattern

I am using Pythons re module, and pythex.org for validation.我正在使用 Pythons re模块和pythex.org进行验证。

Answer 1

You can split your patterns easily with您可以轻松地拆分您的模式

regexs = 'sphere_a [sS]phere_b [sS]pher* [sS]pher* sph[eE]* sphere_a ^sphe* ^sp[hH]er*'.split().

Then you can iterate over the patterns like this:然后你可以像这样迭代模式：

for regex in regexs:
    m = re.findall(regex, content)

But it will return duplicate matches.但它会返回重复的匹配项。

正则表达式匹配句子开头、结尾和中间的有效单词

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-01-31 23:01:21

正则表达式匹配句子开头、结尾和中间的有效单词

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-01-31 23:01:21

解决方案1
2 已采纳 2016-01-31 23:01:21