[英]Regex to match sentence with decimals and names
我覺得我已經很接近這個了,但是一旦我將標點符號捕獲移到句子的末尾,它就會捕獲錯誤。
句子方案如下:
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it. This is a sentence with odd spacing. This is one with lots of exclamation marks at the end!!!!This is another with a decimal 10.00 in the middle. Why is it so hard to find sentence endings?Last sentence without a space at the start.
這應導致捕獲:
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it.
This is a sentence with odd spacing.
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle.
Why is it so hard to find sentence endings?
Last sentence without a space at the start.
這是我的表達:
.*?(?:[!?.;]+)((?<!(Mr|Mrs|Dr|Rev).?)(?=\D|\s+|$)(?:[^!?.;\d]|\d*\.?\d+)*)(?=(?:[!?.;]+))
目前存在兩個問題:
進入其中的數據將在某種程度上進行標准化,因此我們知道它將以句號結尾並且在一行上,但是歡迎使用任何指針。
我同意@spender的建議,建議使用解析器來過濾所有標點規則。
但是,以下將適用於您的方案。
foreach (Match m in Regex.Matches(s, @"(.*?(?<!(?:\b[A-Z]|Mrs?|Dr|Rev|\d))[!?.;]+)\s*"))
Console.WriteLine(m.Groups[1].Value);
輸出量
This is a sentence with a name like Mr. D. Smith and Mr J. Smith in it.
This is a sentence with odd spacing.
This is one with lots of exclamation marks at the end!!!!
This is another with a decimal 10.00 in the middle.
Why is it so hard to find sentence endings?
Last sentence without a space at the start.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.