[英]Python Regex to match a all variations of a keyword except if preceded by a capitalized word
I'm looking for a Python Regex to match a all variations of a keyword except if preceded by a capitalized word -> except when that capitalized word is the start of a sentence.我正在寻找一个 Python 正则表达式来匹配关键字的所有变体,除非前面有一个大写单词 - >除非那个大写单词是句子的开头。 Also excludes words between brackets.
也排除括号之间的单词。
for example:例如:
keyword = 'public record'
string1 = 'Hello. His public records are available at city hall.' #match public records His is the start of a sentence so we ignore that it is capitalized and match
string2 = 'his records are at Newsom Public Record DataBase' #nomatch
string3 = 'Public records may be available online' #match Public records
string4 = '[public records](http:/....)' #nomatch
So far I have tried:到目前为止,我已经尝试过:
pattern = f'(?<!\[)(?i)\\w*{keyword}\\w*' #Doesn't take into account preceding capitalized words
pattern = f'(?<![A-Z][\w-]\s)(?<!\[)(?i)\\w*{keyword}\\w*' #Doesn't work for cap words > 2 chara
You can specify the various allowed beginnings, ie start of sentence + cap word, non-cap word or beginning of string, and then assert that the keyword follows with a lookahead:您可以指定各种允许的开头,即句子开头 + 大写单词、非大写单词或字符串开头,然后断言关键字后跟前瞻:
pattern = r'(\. [A-Z]\w* |\W[^A-Z]\w* |^)(?=[pP]ublic [rR]ecord)'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.