[英]RegEx split string into words by space and containing chars
如何使用Regex.Split(input, pattern)
方法执行此拆分?
This is a [normal string ] made up of # different types # of characters
字符串输出数组:
1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters
而且它应该保留前导空格,所以我想保留所有内容。 一个字符串包含20个字符,字符串数组在所有元素中的总数应为20个字符。
我试过的
Regex.Split(text, @"(?<=[ ]|# #)")
Regex.Split(text, @"(?<=[ ])(?<=# #")
我建议匹配 ,即提取单词,而不是拆分 :
string source = @"This is a [normal string ] made up of # different types # of characters";
// Three possibilities:
// - plain word [A-Za-z]+
// - # ... # quotation
// - [ ... ] quotation
string pattern = @"[A-Za-z]+|(#.*?#)|(\[.*?\])";
var words = Regex
.Matches(source, pattern)
.OfType<Match>()
.Select(match => match.Value)
.ToArray();
Console.WriteLine(string.Join(Environment.NewLine, words
.Select((w, i) => $"{i + 1}. {w}")));
结果:
1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters
您可以使用
var res = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
(\\[[^][]*]|#[^#]*#)
部分是一个捕获组,其值与拆分项一起输出到结果列表。
图案细节
(\\[[^][]*]|#[^#]*#)
-组1:以下两种模式之一:
\\[[^][]*]
- [
,后跟0+个除[
和]
以外的字符,然后是]
#[^#]*#
-a #
,然后是0+,而不是#
,然后是#
|
- 要么 \\s+
-1+空格 C#演示 :
var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
.Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));
结果:
This
is
a
[normal string ]
made
up
of
# different types #
of
characters
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.