[英]Python Regex for ignoring a sentence with two consecutive upper case letters
I have a simple problem at hand to ignore the sentences that contain two or more consecutive capital letters and many more grammar rules . 我手头有一个简单的问题就是忽略包含两个或更多连续大写字母和更多语法规则的句子。
Issue: By the definition the regex should not match the string 'This is something with two CAPS.'
问题:根据定义,正则表达式不应该匹配字符串
'This is something with two CAPS.'
, but it does match. ,但确实匹配。
Code: 码:
''' Check if a given sentence conforms to given grammar rules
$ Rules
* Sentence must start with a Uppercase character (e.g. Noun/ I/ We/ He etc.)
* Then lowercase character follows.
* There must be spaces between words.
* Then the sentence must end with a full stop(.) after a word.
* Two continuous spaces are not allowed.
* Two continuous upper case characters are not allowed.
* However the sentence can end after an upper case character.
'''
import re
# Returns true if sentence follows these rules else returns false
def check_sentence(sentence):
checker = re.compile(r"^((^(?![A-Z][A-Z]+))([A-Z][a-z]+)(\s\w+)+\.$)")
return checker.match(sentence)
print(check_sentence('This is something with two CAPS.'))
Output: 输出:
<_sre.SRE_Match object; span=(0, 32), match='This is something with two CAPS.'>
It's probably easier to write your regex in the negative (find all sentences that are bad sentences) than it is in the positive. 将你的正则表达式写成负数(找到所有不好的句子的句子)可能比在正数中更容易。
checker = re.compile(r'([A-Z][A-Z]|[ ][ ]|^[a-z])')
check2 = re.compile(r'^[A-Z][a-z].* .*\.$')
return not checker.findall(sentence) and check2.findall(sentence)
Your negative lookahead is only applying to the beginning of the string being tested. 您的负向前瞻仅适用于正在测试的字符串的开头。
2nd Capturing Group (^(?![AZ][AZ]+))
第二捕获组
(^(?![AZ][AZ]+))
^
asserts position at start of the string ^
断言字符串开头的位置
Negative Lookahead (?![AZ][AZ]+)
否定前瞻
(?![AZ][AZ]+)
"This will NOT fail."
"THIS will fail."
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.