正则表达式在两个选定的单词之间捕获多个单词 python

Question

I new fairly new to regex and can't understand what I am doing wrong.我对正则表达式相当陌生，无法理解我做错了什么。

I have different tweets about women and am trying to capture sentences that contain words.我有不同的关于女性的推文，并试图捕捉包含单词的句子。

An example of a piece of text: all women should be earning less within the workplace if you ask me一段文字的例子： all women should be earning less within the workplace if you ask me

and I a trying to capture women should be earning less within the workplace and have tried several regex patterns including:我试图吸引women should be earning less within the workplace并尝试了几种正则表达式模式，包括：

women(\w+\W+\s*\S*)workplace
women(\w+\W+\s*\S*){2,}workplace
\bwomen(\w+\W+\s*\S*){2,}workplace\b

From my understanding this code should capture unlimited number of word characters, spaces or non-whitespace characters at least twice or more.据我了解，此代码应捕获无限数量的单词字符、空格或非空白字符至少两次或更多。 I also used the boundary anchor to see if that would work but it didn't.我还使用了边界锚来查看这是否可行，但它没有。

However I receive no matches for this at all.但是我根本没有收到任何匹配。 Could someone explain what I am doing wrong please.有人可以解释一下我做错了什么吗。

Thanks.谢谢。

Answer 1

If you are trying to capture everything between two keywords, try something like:如果您试图捕获两个关键字之间的所有内容，请尝试以下操作：

\bwomen\b.*\bworkplace\b

To capture the entire sentence that contains the two keywords, use something like:要捕获包含两个关键字的整个句子，请使用以下内容：

\b[^.??]*.\bwomen\b?*.\bworkplace\b[^??!]*\b

This assumes that sentences are separated with .这假设句子用分隔. , ? , ? , or ! , 或! . . It will also incorrectly identify punctuation in abbreviations like Ms. as sentence boundaries.它还会错误地将诸如Ms.之类的缩写中的标点符号识别为句子边界。

正则表达式在两个选定的单词之间捕获多个单词 python

问题描述

1 个解决方案

解决方案1
4 已采纳 2020-05-26 21:26:30

正则表达式在两个选定的单词之间捕获多个单词 python

问题描述

1 个解决方案

解决方案1 4 已采纳 2020-05-26 21:26:30

解决方案1
4 已采纳 2020-05-26 21:26:30