简体   繁体   English

如何将HTML字符串加载到Webkit.net中,以便可以访问其“ DOM”

[英]How can I load an HTML string into Webkit.net so I can access its “DOM”

I'd like to use Webkit.net to load an (X)HTML string and then analyze the DOM in order to "compress" it (remove whitespace, newlines, convert <input></input> and <input /> to <input> (basically an XHTML to HTML conversion, doctype allowing). 我想使用Webkit.net加载(X)HTML字符串,然后分析DOM以“压缩”它(删除空格,换行符,将<input></input><input /><input> (基本上是XHTML到HTML的转换,允许使用doctype)。

Is there anyway to do get the "DOM tree" in webkit.net? 无论如何,要在webkit.net中获取“ DOM树”吗? If not, are there any .net HTML parsers out there that can do this? 如果不是,是否有任何.net HTML解析器可以做到这一点? If not, is there a .net component that already does what I'm asking? 如果没有,是否存在已经按照我的要求执行的.net组件?

Some Pseudo-code explaining what I'd like to do: 一些伪代码解释了我想做什么:

var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here"  type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script>  <script>/*do more stuff*/</script></body></html>");

var sb = new StringBuilder();

// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
    sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}

// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();

Haven't used Webkit.Net but I have used HTMLAgilityPack to do a similar task to the one you have in mind and it works very well. 没有使用过Webkit.Net,但是我使用了HTMLAgilityPack来完成与您所想的相似的任务,并且效果很好。 So I think you answered your own question. 所以我认为您回答了自己的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM