简体   繁体   English

正则表达式在两个选定的单词之间捕获多个单词 python

[英]regex capturing multiple words in between two chosen words python

I new fairly new to regex and can't understand what I am doing wrong.我对正则表达式相当陌生,无法理解我做错了什么。

I have different tweets about women and am trying to capture sentences that contain words.我有不同的关于女性的推文,并试图捕捉包含单词的句子。

An example of a piece of text: all women should be earning less within the workplace if you ask me一段文字的例子: all women should be earning less within the workplace if you ask me

and I a trying to capture women should be earning less within the workplace and have tried several regex patterns including:我试图吸引women should be earning less within the workplace并尝试了几种正则表达式模式,包括:

women(\w+\W+\s*\S*)workplace
women(\w+\W+\s*\S*){2,}workplace
\bwomen(\w+\W+\s*\S*){2,}workplace\b

From my understanding this code should capture unlimited number of word characters, spaces or non-whitespace characters at least twice or more.据我了解,此代码应捕获无限数量的单词字符、空格或非空白字符至少两次或更多。 I also used the boundary anchor to see if that would work but it didn't.我还使用了边界锚来查看这是否可行,但它没有。

However I receive no matches for this at all.但是我根本没有收到任何匹配。 Could someone explain what I am doing wrong please.有人可以解释一下我做错了什么吗。

Thanks.谢谢。

If you are trying to capture everything between two keywords, try something like:如果您试图捕获两个关键字之间的所有内容,请尝试以下操作:

\bwomen\b.*\bworkplace\b

To capture the entire sentence that contains the two keywords, use something like:要捕获包含两个关键字的整个句子,请使用以下内容:

\b[^.??]*.\bwomen\b?*.\bworkplace\b[^??!]*\b

This assumes that sentences are separated with .这假设句子用 分隔. , ? , ? , or ! , 或! . . It will also incorrectly identify punctuation in abbreviations like Ms. as sentence boundaries.它还会错误地将诸如Ms.之类的缩写中的标点符号识别为句子边界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM