简体   繁体   中英

Limit regex expression by character in c#

I get the following pattern (\\s\\w+) I need matches every words in my string with a space.

For example

When i have this string

many word in the textarea must be happy

I get

 many     
 word    
 in    
 the    
 textarea    
 must    
 be    
 happy

It is correct, but when i have another character, for example

many word in the textarea , must be happy

I get

 many     
 word    
 in    
 the    
 textarea    
 must    
 be    
 happy

But must be happy should be ignored, because i want it to break when another character is in the string

Edit:

Example 2

all cats  { in } the world are nice

Should be return

all
cats

Because { is another separator for me

Example 3

My 3 cats are ... funny

Should be return

My
3
cats
are

Because 3 is alphanumeric and . is separator for me

What can I do?

To do that you need to use the \\G anchors that matches the positions at the start of the string or after the last match. so you can do it with this pattern:

@"(?<=\G\s*)\w+"
[^\w\s\n].*$|(\w+\s+)

Try this.Grab the captures or matches.See demo.Set flag m for multiline mode.

See demo.

http://regex101.com/r/kP4pZ2/12

I think Sam I Am's comment is correct: you'll require two regular expressions.

  1. Capture the text up to a non-word character.
  2. Capture all the words with a space on one side.

Here's the corresponding code:

  1. "^(\\\\w+\\\\s+)+"
  2. "(\\\\w+\\\\s+)"

You can combine these two to capture just the individual words pretty easily - like so

"^(\\\\w+\\\\s+)+"

Here's a complete piece of code demonstrating the pattern:

string input = "many word in the textarea , must be happy";

string pattern = "^(\\w+\\s+)+";

Match match = Regex.Match(input , pattern);

// Never returns a NullReferenceException because of GroupsCollection array indexer - check it out!
foreach(Capture capture in match.Groups[1].Captures)
{
    Console.WriteLine(capture.Value);
}

EDIT

Check out Casimir et Hippolyte for a really clean answer.

All in one regex :-) Result is in list

Regex regex = new Regex(@"^((\w+)\s*)+([^\w\s]|$).*");

Match m = regex.Match(inputString);
if(m.Success)
{
    List<string> list = 
        m.Groups[2].Captures.Cast<Capture>().
        Select(c=>c.Value).ToList();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM