简体   繁体   English

HTML到RichTextBox作为带有超链接的纯文本

[英]HTML to RichTextBox as Plaintext with Hyperlinks

Reading so much about not using RegExes for stripping HTML , I am wondering about how to get some Links into my RichTextBox without getting all the messy html that is also in the content that i download from some newspaper site. 读了很多关于不使用RegExes剥离HTML的内容 ,我想知道如何在不从我从某些报纸网站下载的内容中获取所有凌乱html的情况下,将一些链接添加到RichTextBox中。

What i have: HTML from a newspaper website. 我所拥有的:报纸网站上的HTML。

What i want: The article as plain text in a RichTextBox. 我想要的是:作为RichTextBox中纯文本的文章。 But with links (that is, replacing the <a href="foo">bar</a> with <Hyperlink NavigateUri="foo">bar</Hyperlink> ). 但是使用链接(即将<a href="foo">bar</a>替换为<Hyperlink NavigateUri="foo">bar</Hyperlink> )。

HtmlAgilityPack gives me HtmlNode.InnerText (stripped of all HTML tags) and HtmlNode.InnerHtml (with all tags). HtmlAgilityPack为我提供了HtmlNode.InnerText (带有所有HTML标记)和HtmlNode.InnerHtml (带有所有标记)。 I can get the Url and text of the link(s) with articlenode.SelectNodes(".//a") , but how should i know where to insert that in the plain text of HtmlNode.InnerText ? 我可以使用articlenode.SelectNodes(".//a")获得链接的URL和文本,但是我如何知道在HtmlNode.InnerText的纯文本中插入的HtmlNode.InnerText

Any hint would be appreciated. 任何提示将不胜感激。

Here is how you can do it (with a sample console app but the idea is the same for Silverlight): 这是您可以执行的操作(使用示例控制台应用程序,但对于Silverlight来说,想法是相同的):

Let's suppose you have this HTML: 假设您有以下HTML:

<html>
<head></head>
<body>
Link 1: <a href="foo1">bar</a>
Link 2: <a href="foo2">bar2</a>
</body>
</html>

Then this code: 然后这段代码:

HtmlDocument doc = new HtmlDocument();
doc.Load(myFileHtm);

foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a"))
{
    // replace the HREF element in the DOM at the exact same place
    // by a deep cloned one, with a different name
    HtmlNode newNode = node.ParentNode.ReplaceChild(node.CloneNode("Hyperlink", true), node);

    // modify some attributes
    newNode.SetAttributeValue("NavigateUri", newNode.GetAttributeValue("href", null));
    newNode.Attributes.Remove("href");
}
doc.Save(Console.Out);

will output this: 将输出以下内容:

<html>
<head></head>
<body>
Link 1: <hyperlink navigateuri="foo1">bar</hyperlink>
Link 2: <hyperlink navigateuri="foo2">bar2</hyperlink>
</body>
</html>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM