[英]regex capturing multiple words in between two chosen words python
I new fairly new to regex and can't understand what I am doing wrong.我对正则表达式相当陌生,无法理解我做错了什么。
I have different tweets about women and am trying to capture sentences that contain words.我有不同的关于女性的推文,并试图捕捉包含单词的句子。
An example of a piece of text: all women should be earning less within the workplace if you ask me
一段文字的例子:
all women should be earning less within the workplace if you ask me
and I a trying to capture women should be earning less within the workplace
and have tried several regex patterns including:我试图吸引
women should be earning less within the workplace
并尝试了几种正则表达式模式,包括:
women(\w+\W+\s*\S*)workplace
women(\w+\W+\s*\S*){2,}workplace
\bwomen(\w+\W+\s*\S*){2,}workplace\b
From my understanding this code should capture unlimited number of word characters, spaces or non-whitespace characters at least twice or more.据我了解,此代码应捕获无限数量的单词字符、空格或非空白字符至少两次或更多。 I also used the boundary anchor to see if that would work but it didn't.
我还使用了边界锚来查看这是否可行,但它没有。
However I receive no matches for this at all.但是我根本没有收到任何匹配。 Could someone explain what I am doing wrong please.
有人可以解释一下我做错了什么吗。
Thanks.谢谢。
If you are trying to capture everything between two keywords, try something like:如果您试图捕获两个关键字之间的所有内容,请尝试以下操作:
\bwomen\b.*\bworkplace\b
To capture the entire sentence that contains the two keywords, use something like:要捕获包含两个关键字的整个句子,请使用以下内容:
\b[^.??]*.\bwomen\b?*.\bworkplace\b[^??!]*\b
This assumes that sentences are separated with .
这假设句子用 分隔
.
, ?
,
?
, or !
, 或
!
. . It will also incorrectly identify punctuation in abbreviations like
Ms.
as sentence boundaries.它还会错误地将诸如
Ms.
之类的缩写中的标点符号识别为句子边界。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.