簡體   English   中英

正則表達式以匹配帶有小數和名稱的句子

[英]Regex to match sentence with decimals and names

我覺得我已經很接近這個了,但是一旦我將標點符號捕獲移到句子的末尾,它就會捕獲錯誤。

句子方案如下:

This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. This is a  sentence      with odd   spacing. This is one with lots of exclamation marks at the end!!!!This is another with a decimal 10.00 in the middle. Why is it so hard to find sentence endings?Last sentence without a space at the start.

這應導致捕獲:

This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. 
This is a  sentence      with odd   spacing. 
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle. 
Why is it so hard to find sentence endings?
Last sentence without a space at the start.

這是我的表達:

.*?(?:[!?.;]+)((?<!(Mr|Mrs|Dr|Rev).?)(?=\D|\s+|$)(?:[^!?.;\d]|\d*\.?\d+)*)(?=(?:[!?.;]+))

目前存在兩個問題:

  1. 標點是開始
  2. 它正確地處理每個句子一個名字,但不能正確處理兩個名字(為了獲得加分,我希望它能正確捕獲“ DJ Smith先生”,但我無法弄清楚它如何不匹配以單個字母結尾的句子。

進入其中的數據將在某種程度上進行標准化,因此我們知道它將以句號結尾並且在一行上,但是歡迎使用任何指針。

我同意@spender的建議,建議使用解析器來過濾所有標點規則。

但是,以下將適用於您的方案。

foreach (Match m in Regex.Matches(s, @"(.*?(?<!(?:\b[A-Z]|Mrs?|Dr|Rev|\d))[!?.;]+)\s*"))
         Console.WriteLine(m.Groups[1].Value);

輸出量

This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. 
This is a  sentence      with odd   spacing. 
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle. 
Why is it so hard to find sentence endings?
Last sentence without a space at the start.

Ideone演示

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM