简体   繁体   中英

How can I load an HTML string into Webkit.net so I can access its “DOM”

I'd like to use Webkit.net to load an (X)HTML string and then analyze the DOM in order to "compress" it (remove whitespace, newlines, convert <input></input> and <input /> to <input> (basically an XHTML to HTML conversion, doctype allowing).

Is there anyway to do get the "DOM tree" in webkit.net? If not, are there any .net HTML parsers out there that can do this? If not, is there a .net component that already does what I'm asking?

Some Pseudo-code explaining what I'd like to do:

var DOM = Webkit.DOM.FromString("<!DOCTYPE HTML><html><head><title> Hello</title></head><body><INPUT Value="Click here"  type="submit" /><br /><span class='bold red'>An element!</span><script type='text-javascript'>/*do stuff*/</script>  <script>/*do more stuff*/</script></body></html>");

var sb = new StringBuilder();

// this would recursively iterate over all childnodes in a real scenario.
foreach(var node in DOM.Nodes){
    sb.Append(/* Compress & sort attributes, normalize & strip unneeded quotes, remove unneeded end & self-closing tags, etc. */);
}

// return optimally compressed output...
// something like:
// <!doctype html><title>Hello</title><input type=submit value="Click here"><br><span class="bold red">An element!</span><script>/*do stuff*/</script><script>/*do more stuff*/</script>
return sb.ToString();

Haven't used Webkit.Net but I have used HTMLAgilityPack to do a similar task to the one you have in mind and it works very well. So I think you answered your own question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM