简体   繁体   English

如何使用HTMLAgilityPack修复html标签(缺少<open>&<close>标签)

[英]How to fix html tags(which is missing the <open> & <close> tags) with HTMLAgilityPack

I have an html with <div><h1> hello Hi</div> <div>hi </p></div> 我有一个带<div><h1> hello Hi</div> <div>hi </p></div>的html <div><h1> hello Hi</div> <div>hi </p></div>

Required Output : <div><h1> hello </h1></div> <div><p>hi </p></div> 必需输出: <div><h1> hello </h1></div> <div><p>hi </p></div>

Using HTML agility pack is it possible to fix this kind of similar issues with missing closing and opening tags? 使用HTML敏捷包是否可以通过缺少关闭和打开标记来修复此类问题?

The library isn't intelligent enough to create the opening p where you put it, but it's intelligent enough to create the missing h1 . 该库不够智能,无法创建您放置它的开口p ,但它足够智能,可以创建丢失的h1 And in general, it creates valid HTML always, but not always the one you would expect. 一般来说,它总是创建有效的HTML,但并不总是你期望的那个。

So this code: 所以这段代码:

        HtmlDocument doc = new HtmlDocument();
        doc.Load(yourhtml);
        doc.Save(Console.Out);

will dump this: 会抛弃这个:

<div><h1> hello Hi</h1></div> <div>hi <p></div>

Which is not what you want, but is valid HTML. 这不是你想要的,但是有效的HTML。 You can also add a little trick like this: 你还可以添加一个这样的小技巧:

        HtmlNode.ElementsFlags["p"] = HtmlElementFlag.Closed;
        HtmlDocument doc = new HtmlDocument();
        doc.Load(yourhtml);
        doc.Save(Console.Out);

that will dump this: 会抛弃这个:

<div><h1> hello Hi</h1></div> <div>hi <p></p></div>

在进行HtmlAgilityPack.HtmlDocument.LoadHTML(yourhtml) HTMLAgilityPack会自动为您修复标记,然后您可以使用以下命令访问这些标记: HtmlAgilityPack.HtmlDocument.DocumentNode.OuterHTML

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM