简体   繁体   中英

Searching for a RegEx to split a text in it words

I am searching for a RegularExpression to split a text in it words. I have tested

Regex.Split(text, @"\s+")

But this gives me for example for

this (is a) text. and

this
(is
a)
text
and

But I search for a solution, that gives me only the words - without the (, ), . etc. It should also split a text like

end.begin

in two words.

Try this:

Regex.Split(text, @"\W+")

\\W is the counterpart to \\w , which means alpha-numeric.

You can do:

var text = "this (is a) text. and";

// to replace unwanted characters with space
text = System.Text.RegularExpressions.Regex.Replace(text, "[(),.]", " ");

// to split the text with SPACE delimiter
var splitted = text.Split(null as char[], StringSplitOptions.RemoveEmptyEntries); 

foreach (var token in splitted) 
{           
    Console.WriteLine(token);
}

See this Demo

You're probably better off matching the words rather than splitting.

If you use Split (with \\W as Regexident suggested ), then you could get an extra string at the beginning and end. For example, the input string (ab) would give you four outputs: "" , "a" , "b" , and another "" , because you're using the ( and ) as separators.

What you probably want to do is just match the words. You can do that like this:

Regex.Matches(text, "\\w+").Cast<Match>().Select(match => match.Value)

Then you'll get just the words, and no extra empty strings at the beginning and end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM