[英]How to split a String while ignoring special strings
假設我有一個字符串,需要用{“”,“。”}分割。 給定的字符串,其中包含無法拆分的全名。
例如
fullString = "Jhon Due was drawing the quick brown fox. Alex King draw a fox"
splitSeperatorArray = {",", " "}
ignoreSplitArray = {"Jhon Due", "Alex King"}
理想的結果是:
{“ Jhon Due”,“ was”,“ drawing”,“ the”,“ quick”,brown”,vfox”,“ Alex King”,“ draw”,“ a”,“ fox”}
在處理大數據時,將了解有關正確方法的信息。
謝謝!
您可以使用正則表達式來實現。 試試這個代碼:
var fullString = "Jhon Due was drawing,the quick brown fox. Alex King draw a fox";
var ignoreSplitArray = new[] {"Jhon Due", "Alex King"};
var ignore = string.Join("|", ignoreSplitArray);
var regex = new Regex($" |,|({ignore})");
var result = regex.Split(fullString).Where(s => s.Length > 0).ToArray();
您不能僅使用拆分來執行此操作。 例如,您可以拆分然后在結果中搜索是否有特殊字符串,然后得出最終結果。
我建議使用正則表達式而不是Split
; 我們可以嘗試將諸如John Due
和Alex King
類的名稱包含在模式中,而不是維護它們的數組(這在執行大數據處理時很困難):
using System.Text.RegularExpressions;
string source = "John Due was drawing the quick brown fox. Alex King draws a fox";
string pattern = @"([A-Z][a-z]+(\s[A-Z][a-z]+)*)|([a-z]+)";
var result = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value);
Console.Write(string.Join("; ", result));
結果:
John Due; was; drawing; the; quick; brown; fox; Alex King; draws; a; fox
編輯:如果數字 (請參閱注釋)可以出現在文本中,則必須將它們包括在pattern
,例如
string source = "John111 Due was drawing the quick 123 brown fox. Alex King draws a fox";
string pattern = @"([A-Z][a-z0-9]+(\s[A-Z][a-z0-9]+)*)|([a-z0-9]+)";
...
結果:
John111 Due; was; drawing; the; quick; 123; brown; fox; Alex King; draws; a; fox
另一種可能性
string source =
"John111 Due (Джон Дью) was drawing the quick 123 [۰۱۲] brown fox. Alex King draws a fox";
string pattern = @"(\p{Lu}\w+(\s\p{Lu}\w+)*)|(\w+)";
如果要提取非英語字母(例如俄語)和數字(例如波斯語)
結果:
John111 Due; Джон Дью; was; drawing; the; quick; 123; ۰۱۲; brown; fox; Alex King; draws; a; fox
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.