[英]How to fix html tags(which is missing the <open> & <close> tags) with HTMLAgilityPack
I have an html with <div><h1> hello Hi</div> <div>hi </p></div>
我有一个带<div><h1> hello Hi</div> <div>hi </p></div>
的html <div><h1> hello Hi</div> <div>hi </p></div>
Required Output : <div><h1> hello </h1></div> <div><p>hi </p></div>
必需输出: <div><h1> hello </h1></div> <div><p>hi </p></div>
Using HTML agility pack is it possible to fix this kind of similar issues with missing closing and opening tags? 使用HTML敏捷包是否可以通过缺少关闭和打开标记来修复此类问题?
The library isn't intelligent enough to create the opening p
where you put it, but it's intelligent enough to create the missing h1
. 该库不够智能,无法创建您放置它的开口p
,但它足够智能,可以创建丢失的h1
。 And in general, it creates valid HTML always, but not always the one you would expect. 一般来说,它总是创建有效的HTML,但并不总是你期望的那个。
So this code: 所以这段代码:
HtmlDocument doc = new HtmlDocument();
doc.Load(yourhtml);
doc.Save(Console.Out);
will dump this: 会抛弃这个:
<div><h1> hello Hi</h1></div> <div>hi <p></div>
Which is not what you want, but is valid HTML. 这不是你想要的,但是有效的HTML。 You can also add a little trick like this: 你还可以添加一个这样的小技巧:
HtmlNode.ElementsFlags["p"] = HtmlElementFlag.Closed;
HtmlDocument doc = new HtmlDocument();
doc.Load(yourhtml);
doc.Save(Console.Out);
that will dump this: 会抛弃这个:
<div><h1> hello Hi</h1></div> <div>hi <p></p></div>
在进行HtmlAgilityPack.HtmlDocument.LoadHTML(yourhtml)
HTMLAgilityPack会自动为您修复标记,然后您可以使用以下命令访问这些标记: HtmlAgilityPack.HtmlDocument.DocumentNode.OuterHTML
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.