句子的正则表达

Question

I'm trying to write a regular expression to represent a sentence with the following conditions: starts with a capital letter, ends with a period (and only one period can appear), and is allowed to contain a comma or semi-colon, but when it does, it must appear as (letter)(semicolon)(space) or (letter)(comma)(space). 我正在尝试编写一个正则表达式来表示具有以下条件的句子：以大写字母开头，以句点结尾（并且只能出现一个句点），并且允许包含逗号或分号，但是如果确实如此，它必须显示为（字母）（分号）（空格）或（字母）（逗号）（空格）。

I've got the capital letter and period down. 我有大写字母，下了句号。 I have the idea for the code but I think I'm not getting the syntax completely right... 我有代码的想法，但我认为语法没有完全正确...

In English, my expression for a sentence looks like this: 用英语，我的句子表达如下：

(capital letter) ((lowercase letter)(space) ((lowercase letter)(comma)(space))* 
((lowercase letter)(semicolon)(space)* )* (period)

I realize this ignores the case where the first letter of the sentence is immediately followed by a comma or semicolon, but it's safe to ignore that case. 我意识到这忽略了句子的第一个字母紧跟着逗号或分号的情况，但是忽略这种情况是安全的。

Now when I try to code this in Python, I try the following (I've added whitespace to make things easier to read): 现在，当我尝试使用Python编写代码时，请尝试以下操作（我添加了空格以使内容更易于阅读）：

sentence = re.compile("^[A-Z]  [a-z\\s  (^[a-z];\\s$)* (^[a-z],\\s$)*]*  \.$")

I feel like it's a syntax issue... I'm not sure if I'm allowed to have the semicolon and comma portions inside of parentheses. 我觉得这是一个语法问题...我不确定是否允许在括号内使用分号和逗号部分。

Sample inputs that match the definition: 符合定义的样本输入：

"This is a sentence."
"Hello, world."
"Hi there; hi there."

Sample inputs that do not match the definition: 与定义不匹配的样本输入：

"i ate breakfast."
"This is , a sentence."
"What time is it?"

Answer 1

^(?!.*[;,]\S)(?!.* [;,])[A-Z][a-z\s,;]+\.$

Its easier to use lookaheads to remove invalid sentences.See demo. 使用lookaheads可以更轻松地删除无效的句子。请lookaheads演示。

https://regex101.com/r/vV1wW6/36#python https://regex101.com/r/vV1wW6/36#python

Answer 2

This would match what you said above. 这与您上面所说的相符。

^"[AZ][az]*(\\s*|[az]*|(?<!\\s)[;,](?=\\s))*[.]"$ ? ^"[AZ][az]*(\\s*|[az]*|(?<!\\s)[;,](?=\\s))*[.]"$ ？ => demo => 演示

This would match: 这将匹配：

"This is a sentence."
"Hello, world."
"Hi there; hi there."

This won't match: 这将不匹配：

"i ate breakfast."
"This is , a sentence."
"What time is it?"
"I a ,d am."
"I a,d am."

If you don't need the " just remove it from the regex. 如果您不需要"只需将其从正则表达式中删除即可。

If you need the regex in python, try this 如果您需要python中的正则表达式，请尝试以下操作

re.compile(r'^[AZ][az]*(\\s*|[az]*|(?<!\\s)[;,](?=\\s))*[.]$')

Python demo Python演示

import re
tests = ["This is a sentence."
,"Hello, world."
,"Hi there; hi there."
,"i ate breakfast."
,"This is , a sentence."
,"What time is it?"]
rex = re.compile(r'^[A-Z][a-z]*(\s*|[a-z]*|(?<![\s])[;,])*[.]$')
for test in tests:
    print rex.match(test)

output 输出

<_sre.SRE_Match object at 0x7f31225afb70>
<_sre.SRE_Match object at 0x7f31225afb70>
<_sre.SRE_Match object at 0x7f31225afb70>
None
None
None

Answer 3

I ended up modifying my regular expression to 我最终将我的正则表达式修改为

"^[A-Z][a-z\s (a-z,\s)* (a-z;\s)*]*\.$"

and it ended up working just fine. 最终效果很好。 Thanks for everyone's help! 感谢大家的帮助！

句子的正则表达

问题描述

3 个解决方案

解决方案1
0 2015-09-30 05:16:41

解决方案2
0 已采纳 2015-09-30 05:27:00

解决方案3
-1 2015-09-30 05:36:47

句子的正则表达

问题描述

3 个解决方案

解决方案1 0 2015-09-30 05:16:41

解决方案2 0 已采纳 2015-09-30 05:27:00

解决方案3 -1 2015-09-30 05:36:47

解决方案1
0 2015-09-30 05:16:41

解决方案2
0 已采纳 2015-09-30 05:27:00

解决方案3
-1 2015-09-30 05:36:47