[英]Remove HTML nodes from HTTP Request
I have some HTML code stored into a string variable, resulting from a HttpWebRequest
: 我有一些HTML代码存储在字符串变量中,这是由HttpWebRequest
产生的:
<html>
<head>
<div>Lots of scripts and libraries</div>
</head>
<body>
<div>Some very useful data</div>
</body>
<footer>
<div>Not interesting struff</div>
</footer>
<html>
How can I do to remove all unecesary nodes and get into this: 我该如何删除所有不必要的节点并进入该节点:
<body>
<div>Some very useful data</div>
</body>
The easiest way is to use HtmlAgilityPack
to grab just the body
tag. 最简单的方法是使用HtmlAgilityPack
抓取body
标签。
var document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(html);
HtmlNode body = document.DocumentNode.SelectSingleNode("//body");
From there, you can use HtmlAgilityPack
to further parse the body
node for more detail. 从那里,您可以使用HtmlAgilityPack
进一步解析body
节点以获取更多详细信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.