简体   繁体   English

RegEx - 不解析句子末尾的点(。)

[英]RegEx - Not parsing dot(.) at the end of a sentence

C# .Net4.5 C#.Net4.5

I have the following regEx expression 我有以下regEx表达式

^([0-9A-Z.]?[0-9a-z.]*\b\s*)+$

What this should do is match on a sentence where the words in the sentence may have a capital at the start of the word but not after the first letter and it may have a dot(.) anywhere within the sentence. 这应该做的是匹配一个句子,句子中的单词可能在单词的开头有一个大写但不在第一个单词之后,并且在句子内的任何地方都可以有一个点(。)。

The expression words with the following 表达词语如下

  • This Works 这个作品
  • Th.is Wo.rks Th.is Wo.rks

But it doesn't work if the dot is at the end of a word 但是如果点位于单词的末尾则不起作用

  • Does not Work. 不起作用。
  • This. 这个。 Does not Work 不起作用

Why doesn't this work if the dot(.) is at the end of a word? 如果点(。)位于单词的末尾,为什么这不起作用?

Why doesn't this work if the dot(.) is at the end of a word? 如果点(。)位于单词的末尾,为什么这不起作用?

\\b matches a word boundary, you don't have a period after that, so you don't get full stops at the end of words. \\b匹配单词边界,之后没有句号,所以你不会在单词结尾处得到句号。


This seems closer: 这似乎更接近:

^([0-9A-Z.]?[0-9a-z.]*(?:\b|\s)\.*)+$

I've added an or for word boundary and space \\b|\\s and put a period in there too. 我添加了一个或用于单词边界和空格\\b|\\s并在其中添加了一个句点。

It matches all 4 of your sample lines. 匹配所有4个样本行。

This seems cleaner: 这似乎更清洁:

^([0-9A-Z.]?[0-9a-z.]*\s*)+$

( Example ) 例子

You don't need word boundary \\b since the characters are restricted 您不需要单词边界\\b因为字符是受限制的
to [A-Za-z.\\s] [A-Za-z.\\s]

Why not keep it simple and just enforce that [AZ] can only exist on a whitespace 为什么不保持简单,只强制[AZ]只能存在于空格上
boundary. 边界。 (below, \\s is replaced with \\h for brevity) (下面, \\s为了简洁而替换为\\h

^\\h*(?:(?<!\\S)[AZ]|[\\da-z.\\h]+)+$

Formatted and tested: 格式化和测试:

 ^                     # BOS
 \h*                   # Optional leading whitespace
 (?:                   # Cluster group start
      (?<! \S )             # Whitespace boundary before capital
      [A-Z]                 # Single capital letter
   |                      # or,
      [\da-z.\h]+           # Multiple digits, lower case letters, dots or whitespace
 )+                    # Cluster group end, do 1 to many times
 $                     # EOS

Thanks for the help. 谢谢您的帮助。 I believe I finally have the answer 我相信我终于有了答案

^(\s*[0-9A-Z.]?[0-9a-z.]*\b\s*[.|\s]*)+$

The reason I need the \\b is because I need the pattern to not match on words that have capitals in the middle of the word. 我需要\\ b的原因是因为我需要模式与单词中间有大写字母的单词不匹配。 When the /b is removed the pattern will match on words with capitals in the middle of the word 当删除/ b时,模式将匹配单词中间带有大写字母的单词

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM