简体   繁体   English

在C#中按字符限制正则表达式

[英]Limit regex expression by character in c#

I get the following pattern (\\s\\w+) I need matches every words in my string with a space. 我得到以下模式(\\s\\w+)我需要用空格将字符串中的每个单词匹配。

For example 例如

When i have this string 当我有这个字符串

many word in the textarea must be happy

I get 我懂了

 many     
 word    
 in    
 the    
 textarea    
 must    
 be    
 happy

It is correct, but when i have another character, for example 是正确的,但是例如当我有另一个角色时

many word in the textarea , must be happy

I get 我懂了

 many     
 word    
 in    
 the    
 textarea    
 must    
 be    
 happy

But must be happy should be ignored, because i want it to break when another character is in the string 但是must be happy应该被忽略,因为我希望它在字符串中出现另一个字符时中断

Edit: 编辑:

Example 2 例子2

all cats  { in } the world are nice

Should be return 应该退货

all
cats

Because { is another separator for me 因为{是我的另一个分隔符

Example 3 例子3

My 3 cats are ... funny

Should be return 应该退货

My
3
cats
are

Because 3 is alphanumeric and . 因为3是字母数字和. is separator for me 是我的分隔符

What can I do? 我能做什么?

To do that you need to use the \\G anchors that matches the positions at the start of the string or after the last match. 为此,您需要使用\\G定位符来匹配字符串开头或最后匹配之后的位置。 so you can do it with this pattern: 因此您可以使用以下模式进行操作:

@"(?<=\G\s*)\w+"
[^\w\s\n].*$|(\w+\s+)

Try this.Grab the captures or matches.See demo.Set flag m for multiline mode. 尝试this.Grab的捕获或matches.See demo.Set标志m为多行模式。

See demo. 参见演示。

http://regex101.com/r/kP4pZ2/12 http://regex101.com/r/kP4pZ2/12

I think Sam I Am's comment is correct: you'll require two regular expressions. 我认为我是Sam的评论是正确的:您将需要两个正则表达式。

  1. Capture the text up to a non-word character. 捕获文本,直到一个非单词字符为止。
  2. Capture all the words with a space on one side. 捕获所有单词,并在一侧留一个空格。

Here's the corresponding code: 这是相应的代码:

  1. "^(\\\\w+\\\\s+)+"
  2. "(\\\\w+\\\\s+)"

You can combine these two to capture just the individual words pretty easily - like so 您可以将两者结合起来,很容易地捕获单个单词-就像这样

"^(\\\\w+\\\\s+)+"

Here's a complete piece of code demonstrating the pattern: 这是演示该模式的完整代码:

string input = "many word in the textarea , must be happy";

string pattern = "^(\\w+\\s+)+";

Match match = Regex.Match(input , pattern);

// Never returns a NullReferenceException because of GroupsCollection array indexer - check it out!
foreach(Capture capture in match.Groups[1].Captures)
{
    Console.WriteLine(capture.Value);
}

EDIT 编辑

Check out Casimir et Hippolyte for a really clean answer. 查看Casimir et Hippolyte,这是一个非常干净的答案。

All in one regex :-) Result is in list 合计一个正则表达式:-)结果在list

Regex regex = new Regex(@"^((\w+)\s*)+([^\w\s]|$).*");

Match m = regex.Match(inputString);
if(m.Success)
{
    List<string> list = 
        m.Groups[2].Captures.Cast<Capture>().
        Select(c=>c.Value).ToList();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM