RegEx按空格将字符串分成单词并包含字符

Question

How can one perform this split with the Regex.Split(input, pattern) method? 如何使用Regex.Split(input, pattern)方法执行此拆分？

This is a [normal string ] made up of # different types # of characters

Array of strings output: 字符串输出数组：

1. This 
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters

Also it should keep the leading spaces, so I want to preserve everything. 而且它应该保留前导空格，所以我想保留所有内容。 A string contains 20 chars, array of strings should total 20 chars across all elements. 一个字符串包含20个字符，字符串数组在所有元素中的总数应为20个字符。

What I have tried: 我试过的

Regex.Split(text, @"(?<=[ ]|# #)")

Regex.Split(text, @"(?<=[ ])(?<=# #")

Answer 1

I suggest matching , ie extracting words, not splitting : 我建议匹配，即提取单词，而不是拆分：

string source = @"This is a [normal string ] made up of # different types # of characters";

// Three possibilities:
//   - plain word [A-Za-z]+
//   - # ... # quotation
//   - [ ... ] quotation  
string pattern = @"[A-Za-z]+|(#.*?#)|(\[.*?\])";

var words = Regex
  .Matches(source, pattern)
  .OfType<Match>()
  .Select(match => match.Value)
  .ToArray();

Console.WriteLine(string.Join(Environment.NewLine, words
  .Select((w, i) => $"{i + 1}. {w}")));

Outcome: 结果：

1. This
2. is
3. a
4. [normal string ]
5. made
6. up
7. of
8. # different types #
9. of
10. characters

Answer 2

You may use 您可以使用

var res = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));

See the regex demo 见正则表达式演示

The (\\[[^][]*]|#[^#]*#) part is a capturing group whose value is output to the resulting list along with the split items. (\\[[^][]*]|#[^#]*#)部分是一个捕获组，其值与拆分项一起输出到结果列表。

Pattern details 图案细节

(\\[[^][]*]|#[^#]*#) - Group 1: either of the two patterns: (\\[[^][]*]|#[^#]*#) -组1：以下两种模式之一：
- \\[[^][]*] - [ , followed with 0+ chars other than [ and ] and then ] \\[[^][]*] - [ ，后跟0+个除[和]以外的字符，然后是]
- #[^#]*# - a # , then 0+ chars other than # and then # #[^#]*# -a # ，然后是0+，而不是# ，然后是#
| - or - 要么
\\s+ - 1+ whitespaces \\s+ -1+空格

C# demo : C＃演示：

var s = "This is a [normal string ] made up of # different types # of characters";
var results = Regex.Split(s, @"(\[[^][]*]|#[^#]*#)|\s+")
    .Where(x => !string.IsNullOrEmpty(x));
Console.WriteLine(string.Join("\n", results));

Result: 结果：

This
is
a
[normal string ]
made
up
of
# different types #
of
characters

Answer 3

It would be easier using matching approach however it can be done using negative lookeaheads : 使用匹配方法会更容易，但是可以使用负前瞻性符号来完成：

[ ](?![^\]\[]*\])(?![^#]*\#([^#]*\#{2})*[^#]*$)

matches a space not followed by 匹配一个不跟在后面的空格

any character sequence except [ or ] followed by ] 除[或]后跟]以外的任何字符序列
# followed by an even number of # #接着偶数#

RegEx按空格将字符串分成单词并包含字符

问题描述

3 个解决方案

解决方案1
2 2018-02-13 12:38:42

解决方案2
1 2018-02-13 12:40:37

解决方案3
0 2018-02-13 13:08:43

RegEx按空格将字符串分成单词并包含字符

问题描述

3 个解决方案

解决方案1 2 2018-02-13 12:38:42

解决方案2 1 2018-02-13 12:40:37

解决方案3 0 2018-02-13 13:08:43

解决方案1
2 2018-02-13 12:38:42

解决方案2
1 2018-02-13 12:40:37

解决方案3
0 2018-02-13 13:08:43