简体   繁体   English

Regex.Split()在保留空格的同时判断单词

[英]Regex.Split() sentence to words while preserving whitespace

我正在使用Regex.Split()来获取用户输入并将其转换为列表中的单个单词,但此时它会删除它们添加的任何空格,我希望它保留空白。

string[] newInput = Regex.Split(updatedLine, @"\s+");
string text = "This            is some text";
var splits = Regex.Split(text, @"(?=(?<=[^\s])\s+)");

foreach (string item  in splits)
    Console.Write(item);
Console.WriteLine(splits.Count());

This will give you 4 splits each having all the leading spaces preserved. 这将为您提供4个拆分,每个拆分保留所有前导空格。

(?=\s+)

Means split from the point where there are spaces ahead. 意味着从前方有空位的地方分开。 But if you use this alone it will create 15 splits on the sample text because every space is followed by another space in case of repeated spaces. 但是如果单独使用它,它将在示例文本上创建15个分割,因为在重复空格的情况下,每个空格后面跟着另一个空格。

(?=(?<=[^\s])\s+)

This means split from a point which has non space character before it and it has spaces ahead of it. 这意味着从它前面具有非空格字符的点开始分割,并且它前面有空格。

If the text starts from a space and you want that to be captured in first split with no text then you can modify the expression to following 如果文本从空格开始并且您希望在没有文本的第一次拆分中捕获该文本,则可以将表达式修改为以下

(?=(?<=^|[^\s])\s+)

Which means series of spaces need to have a non space character before it OR start of the string. 这意味着一系列空格在字符串开始之前需要具有非空格字符。

I'm guessing that some of the "words" you're interested in are actually phrases where spaces are acceptable. 我猜你感兴趣的一些“单词”实际上是空格可以接受的短语。 You can't easily use the space character as both a phrase delimiter and an allowable character within the phrase itself. 您不能轻易地将空格字符用作短语分隔符和短语本身中的允许字符。 Try using a comma for a delimiter instead: 请尝试使用逗号作为分隔符:

string updatedLine = "user,input,two words,even three words";
string[] newInput = Regex.Split(updatedLine, @",");

This version of the regex allows trailing spaces after the commas: 此版本的正则表达式允许逗号后面的尾随空格:

string updatedLine = "user, input,   two words,    even three words";
string[] newInput = Regex.Split(updatedLine, @",\s+|,");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM