[英]HTML Parser with AngleSharp - Text in IElement
我正在編寫一個帶有AngleSharp的HTML解析器,它應該像這樣輸入HTML:
<p>
Paragraph Text
<a href="https://www.example com" class="external text" target="_new" rel="nofollow">Link Text</a>
Paragraph Text 2
</p>
並輸出如下:
<p>
Paragraph Text
<a href="https://www.example com">Link Text</a>
Paragraph Text 2
</p>
我編寫了這個遞歸函數來遍歷整個文檔:
using AngleSharp.Dom;
using AngleSharp.Dom.Html;
using AngleSharp.Extensions;
using AngleSharp.Parser.Html;
private void processHTMLNode(IElement node, IElement targetNode)
{
switch (node.NodeName.ToLower())
{
//...
case "a":
if(node.HasAttribute("href") && node.GetAttribute("href").StartsWith("#"))
{
break;
}
var aNew = outputDocument.CreateElement("a");
aNew.SetAttribute("href", node.GetAttribute("href"));
aNew.TextContent = node.TextContent;
targetNode.AppendChild(aNew);
break;
case "p":
var pNew = outputDocument.CreateElement<IHtmlParagraphElement>();
foreach (var childNode in node.Children)
{
processHTMLNode(childNode, pNew);
}
//TODO fix this
pNew.TextContent = node.TextContent;
targetNode.AppendChild(pNew);
break;
}
//...
}
問題是,在設定TextContent
屬性將覆蓋a
-elements這是兒童p
-Node。 訂單(文本 - >鏈接 - >文本)也會丟失。
我該如何正確實現這個?
好的,所以我設法使用以下代碼解決了我的問題:
using AngleSharp.Dom;
using AngleSharp.Dom.Html;
using AngleSharp.Extensions;
using AngleSharp.Parser.Html;
private void processHTMLNode(INode node, IElement targetElement)
{
IElement elementNode;
IText textNode;
if ((elementNode = node as IElement) != null)
{
switch (node.NodeName.ToLower())
{
//...
case "a":
if(node.HasAttribute("href") && node.GetAttribute("href").StartsWith("#"))
{
break;
}
var aNew = outputDocument.CreateElement("a");
aNew.SetAttribute("href", node.GetAttribute("href"));
foreach (var childNode in elementNode.ChildNodes)
{
processHTMLNode(childNode, aNew);
}
targetElement.AppendChild(aNew);
break;
case "p":
var pNew = outputDocument.CreateElement("p");
foreach (var childNode in node.Children)
{
processHTMLNode(childNode, pNew);
}
targetElement.AppendChild(pNew);
break;
//...
}
}
else if ((textNode = node as IText) != null)
{
var newTextNode = outputDocument.CreateTextNode(textNode.Text);
targetElement.AppendChild(newTextNode);
}
}
AngleSharp文檔中的這張圖片幫了我很多: AngleSharp DOM
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.