我将HTML代码转换为纯文本。但是有很多额外的返回和空格。如何删除它们?
string new_string = Regex.Replace(orig_string, @"\\s", "")
will remove all whitespace
string new_string = Regex.Replace(orig_string, @"\\s+", " ")
will just collapse multiple whitespaces into one
I'm assuming that you want to
If that's correct, then you could use
resultString = Regex.Replace(subjectString, @"( |\r?\n)\1+", "$1");
This keeps the original "type" of whitespace intact and also preserves Windows line endings correctly. If you also want to "condense" multiple tabs into one, use
resultString = Regex.Replace(subjectString, @"( |\t|\r?\n)\1+", "$1");
To condense a string of newlines and spaces (any number of each) into a single newline, use
resultString = Regex.Replace(subjectString, @"(?:(?:\r?\n)+ +){2,}", @"\n");
I used a lot of algorithm for that. Every loop was good but this was clear and absolute.
//define what you want to remove as char
char tb = (char)9; //Tab char ascii code
spc = (char)32; //space char ascii code
nwln = (char)10; //New line char ascii char
yourstring.Replace(tb,"");
yourstring.Replace(spc,"");
yourstring.Replace(nwln,"");
//by defining chars, result was better.
You can use Trim() to remove the spaces and returns. In HTML the spaces is not important so you can omit them by using the Trim() method in System.String class.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.