[英]Python Regex for ignoring a sentence with two consecutive upper case letters
我手头有一个简单的问题就是忽略包含两个或更多连续大写字母和更多语法规则的句子。
问题:根据定义,正则表达式不应该匹配字符串'This is something with two CAPS.'
,但确实匹配。
码:
''' Check if a given sentence conforms to given grammar rules
$ Rules
* Sentence must start with a Uppercase character (e.g. Noun/ I/ We/ He etc.)
* Then lowercase character follows.
* There must be spaces between words.
* Then the sentence must end with a full stop(.) after a word.
* Two continuous spaces are not allowed.
* Two continuous upper case characters are not allowed.
* However the sentence can end after an upper case character.
'''
import re
# Returns true if sentence follows these rules else returns false
def check_sentence(sentence):
checker = re.compile(r"^((^(?![A-Z][A-Z]+))([A-Z][a-z]+)(\s\w+)+\.$)")
return checker.match(sentence)
print(check_sentence('This is something with two CAPS.'))
输出:
<_sre.SRE_Match object; span=(0, 32), match='This is something with two CAPS.'>
将你的正则表达式写成负数(找到所有不好的句子的句子)可能比在正数中更容易。
checker = re.compile(r'([A-Z][A-Z]|[ ][ ]|^[a-z])')
check2 = re.compile(r'^[A-Z][a-z].* .*\.$')
return not checker.findall(sentence) and check2.findall(sentence)
您的负向前瞻仅适用于正在测试的字符串的开头。
第二捕获组(^(?![AZ][AZ]+))
^
断言字符串开头的位置
否定前瞻(?![AZ][AZ]+)
"This will NOT fail."
"THIS will fail."
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.