I need to split a string based on some character array of separators and not lose these separators in string. Ie:
string: "Hello world!"
separators: " !"
result: ("Hello", " ", "world", "!")
Of course, i can write something that goes through that string and returns me needed result, but isn't there something already allowing me to do this, like magically configured String.Split
?
Upd: I need to solution without regexp, because it is very slow for me.
Use regular expression:
string[] parts = Regex.Split(myString, yourPattern);
Test:
string[] parts = Regex.Split("Hello World!", "(!| )");
output:
Hello
" "//just space
World
!
""//empty string
A linq solution:
var s = "Hello world!";
char[] separators = { ' ', '!' };
string current = string.Empty;
List<string> result = s.Aggregate(new List<string>(), (list, ch) =>
{
if (separators.Contains(ch))
{
list.Add(current);
list.Add(ch.ToString());
current = string.Empty;
}
else current += ch;
return list;
}, list => list);
This would be a purely procedural solution:
private static IEnumerable<string> Tokenize(string text, string separators)
{
int startIdx = 0;
int currentIdx = 0;
while (currentIdx < text.Length)
{
// found a separator?
if (separators.Contains(text[currentIdx]))
{
// yield a substring, if it's not empty
if (currentIdx > startIdx)
yield return text.Substring(startIdx, currentIdx - startIdx);
// yield the separator
yield return text.Substring(currentIdx, 1);
// mark the beginning of the next token
startIdx = currentIdx + 1;
}
currentIdx++;
}
}
Note that this solution avoids returning empty tokens. For example, if the input is:
string input = "test!!";
calling Tokenize(input, "!")
will return three tokens:
test
!
!
If the requirement is that two adjacent separators should have an empty token between them, then the if (currentIdx > startIdx)
condition should be removed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.