简体   繁体   English

句子的正则表达

[英]Regular expression of a sentence

I'm trying to write a regular expression to represent a sentence with the following conditions: starts with a capital letter, ends with a period (and only one period can appear), and is allowed to contain a comma or semi-colon, but when it does, it must appear as (letter)(semicolon)(space) or (letter)(comma)(space). 我正在尝试编写一个正则表达式来表示具有以下条件的句子:以大写字母开头,以句点结尾(并且只能出现一个句点),并且允许包含逗号或分号,但是如果确实如此,它必须显示为(字母)(分号)(空格)或(字母)(逗号)(空格)。

I've got the capital letter and period down. 我有大写字母,下了句号。 I have the idea for the code but I think I'm not getting the syntax completely right... 我有代码的想法,但我认为语法没有完全正确...

In English, my expression for a sentence looks like this: 用英语,我的句子表达如下:

(capital letter) ((lowercase letter)(space) ((lowercase letter)(comma)(space))* 
((lowercase letter)(semicolon)(space)* )* (period)

I realize this ignores the case where the first letter of the sentence is immediately followed by a comma or semicolon, but it's safe to ignore that case. 我意识到这忽略了句子的第一个字母紧跟着逗号或分号的情况,但是忽略这种情况是安全的。

Now when I try to code this in Python, I try the following (I've added whitespace to make things easier to read): 现在,当我尝试使用Python编写代码时,请尝试以下操作(我添加了空格以使内容更易于阅读):

sentence = re.compile("^[A-Z]  [a-z\\s  (^[a-z];\\s$)* (^[a-z],\\s$)*]*  \.$")

I feel like it's a syntax issue... I'm not sure if I'm allowed to have the semicolon and comma portions inside of parentheses. 我觉得这是一个语法问题...我不确定是否允许在括号内使用分号和逗号部分。

Sample inputs that match the definition: 符合定义的样本输入:

"This is a sentence."
"Hello, world."
"Hi there; hi there."

Sample inputs that do not match the definition: 与定义不匹配的样本输入:

"i ate breakfast."
"This is , a sentence."
"What time is it?"
^(?!.*[;,]\S)(?!.* [;,])[A-Z][a-z\s,;]+\.$

Its easier to use lookaheads to remove invalid sentences.See demo. 使用lookaheads可以更轻松地删除无效的句子。请lookaheads演示。

https://regex101.com/r/vV1wW6/36#python https://regex101.com/r/vV1wW6/36#python

This would match what you said above. 这与您上面所说的相符。

^"[AZ][az]*(\\s*|[az]*|(?<!\\s)[;,](?=\\s))*[.]"$ ? ^"[AZ][az]*(\\s*|[az]*|(?<!\\s)[;,](?=\\s))*[.]"$ => demo => 演示

This would match: 这将匹配:

"This is a sentence."
"Hello, world."
"Hi there; hi there."

This won't match: 这将不匹配:

"i ate breakfast."
"This is , a sentence."
"What time is it?"
"I a ,d am."
"I a,d am."

If you don't need the " just remove it from the regex. 如果您不需要"只需将其从正则表达式中删除即可。


If you need the regex in python, try this 如果您需要python中的正则表达式,请尝试以下操作

re.compile(r'^[AZ][az]*(\\s*|[az]*|(?<!\\s)[;,](?=\\s))*[.]$')

Python demo Python演示

import re
tests = ["This is a sentence."
,"Hello, world."
,"Hi there; hi there."
,"i ate breakfast."
,"This is , a sentence."
,"What time is it?"]
rex = re.compile(r'^[A-Z][a-z]*(\s*|[a-z]*|(?<![\s])[;,])*[.]$')
for test in tests:
    print rex.match(test)

output 输出

<_sre.SRE_Match object at 0x7f31225afb70>
<_sre.SRE_Match object at 0x7f31225afb70>
<_sre.SRE_Match object at 0x7f31225afb70>
None
None
None

I ended up modifying my regular expression to 我最终将我的正则表达式修改为

"^[A-Z][a-z\s (a-z,\s)* (a-z;\s)*]*\.$"

and it ended up working just fine. 最终效果很好。 Thanks for everyone's help! 感谢大家的帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM