简体   繁体   中英

C# Beginner: Delete ALL between two characters in a string (Regex?)

i have a string with an html code. i want to remove all html tags. so all characters between < and >.

This is my code snipped:

WebClient wClient = new WebClient();
SourceCode = wClient.DownloadString( txtSourceURL.Text );
txtSourceCode.Text = SourceCode;
//remove here all between "<" and ">"
txtSourceCodeFormatted.Text = SourceCode;

hope somebody can help me

Try this:

txtSourceCodeFormatted.Text = Regex.Replace(SourceCode, "<.*?>", string.Empty);

But, as others have mentioned, handle with care .

According to Ravi's answer , you can use

string noHTML = Regex.Replace(inputHTML, @"<[^>]+>|&nbsp;", "").Trim();

or

string noHTMLNormalised = Regex.Replace(noHTML, @"\s{2,}", " ");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM