简体   繁体   中英

How to remove extra returns and spaces in a string by regex?

我将HTML代码转换为纯文本。但是有很多额外的返回和空格。如何删除它们?

string new_string = Regex.Replace(orig_string, @"\\s", "") will remove all whitespace

string new_string = Regex.Replace(orig_string, @"\\s+", " ") will just collapse multiple whitespaces into one

I'm assuming that you want to

  • find two or more consecutive spaces and replace them with a single space, and
  • find two or more consecutive newlines and replace them with a single newline.

If that's correct, then you could use

resultString = Regex.Replace(subjectString, @"( |\r?\n)\1+", "$1");

This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use

resultString = Regex.Replace(subjectString, @"( |\t|\r?\n)\1+", "$1");

To condense a string of newlines and spaces (any number of each) into a single newline, use

resultString = Regex.Replace(subjectString, @"(?:(?:\r?\n)+ +){2,}", @"\n");

I used a lot of algorithm for that. Every loop was good but this was clear and absolute.

//define what you want to remove as char

char tb = (char)9; //Tab char ascii code
spc = (char)32;    //space char ascii code
nwln = (char)10;   //New line char ascii char

yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");

//by defining chars, result was better.

You can use Trim() to remove the spaces and returns. In HTML the spaces is not important so you can omit them by using the Trim() method in System.String class.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM