简体   繁体   中英

How to split a string and keep the delimiters?

I know that you saw many questions like mine, but I hope mine is a little bit different. I'm making a translator and I wanted to split a text into sentences but when I've written this code:

public static string[] GetSentences(string Text)
{
    if (Text.Contains(". ") || Text.Contains("? ") || Text.Contains("! "))
        return Text.Split(new string[] { ". ", "? ", "! " }, StringSplitOptions.RemoveEmptyEntries);
    else
        return new string[0];
}

It removed the ".", "?", "!". I want to keep them how can I do it.


NOTE: I want to split by ". " dot and a space, "? " question mark and space...

Simple, replace them first. I'll use the "|" for readability but you may want to use something more exotic.

// this part could be made a little smarter and more flexible.    
// So, just the basic idea:
Text = Text.Replace(". ", ". |").Replace("? ", "? |").Replace("! ", "! |");

if (Text.Contains("|")) 
    return Text.Split('|', StringSplitOptions.RemoveEmptyEntries);

And I wonder about the else return new string[0]; , that seems odd. Assuming that when there are no delimiters you want the return the input string, you should just remove the if/else construct.

Regex way:

return Regex.Split(Text, @"(?<=[.?!])\s+");

So you just split the string by empty spaces preceded by one of . , ? and ! .

(?<=[.?!])\s+

正则表达式可视化

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM