简体   繁体   中英

What regex should I use to remove links from HTML code in C#?

I have a HTML string and want to replace all links to just a text.

Eg having

Some text <a href="http://google.com/">Google</a>.

need to get

Some text Google.

What regex should I use?

Several similar questions have been posted and the best practice is to use Html Agility Pack which is built specifically to achieve thing like this.

http://www.codeplex.com/htmlagilitypack

I asked about simple regex (thanks Fabrian). The code will be the following:

var html = @"Some text <a href="http://google.com/">Google</a>.";
Regex r = new Regex(@"\<a href=.*?\>");
html = r.Replace(html, "");
r = new Regex(@"\</a\>");
html = r.Replace(html, "");
var html = "<a ....>some text</a>";
var ripper = new Regex("<a.*?>(?<anchortext>.*?)</a>", RegexOptions.IgnoreCase);
html = ripper.Match(html).Groups["anchortext"].Value;
//html = "some text"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM