简体   繁体   English

使用C#获取嵌入式HTML文档

[英]fetching embedded html document using c#

I am developing a WinForm application that automate some tasks on an internal website "xyz.org", when I run the IE or chrome debugger on the website I get the following code: 我正在开发一个WinForm应用程序,该应用程序可以在内部网站“ xyz.org”上自动执行某些任务,当我在该网站上运行IE或chrome调试器时,我将获得以下代码:

<!DOCTYPE html>
<html>....
<body>
  <outer code>....
  <div id="embedded">
  <iframe name="frame1" id="frame1" src="https://qwe.org" border="0" frameborder="0" style="height: 3675px;">
     <!DOCTYPE html>
     <html>
       <inner code>....
     </html>
  </iframe>
  </div>
</body>
</html>

So website xyz has some scripts that in the end call another website qwe. 因此,网站xyz具有一些脚本,这些脚本最终会调用另一个网站qwe。 I am using c#, I am using webBrowser control and I am trying to parse the "FULL" xyz & qwe html documents as showing in IE/Chrome debugger, here is my code: 我正在使用c#,正在使用webBrowser控件,并且试图解析IE / Chrome调试器中显示的“ FULL” xyz和qwe html文档,这是我的代码:

mshtml.HTMLDocument doc = webBrowser1.Document.DomDocument as  mshtml.HTMLDocument;
string html = doc.documentElement.outerHTML;

the html string in the end show the following: 最后的html字符串显示以下内容:

<!DOCTYPE html>
<html>....
<body>
  <outer code>....
  <div id="embedded">
  <iframe name="frame1" id="frame1" src="https://qwe.org" border="0" frameborder="0" style="height: 3675px;">
  </iframe>
  </div>
</body>
</html>

So what is missing is the document code of qwe website: 所以缺少的是qwe网站的文档代码:

<!DOCTYPE html>
     <html>
       <inner code>....
     </html>

Is there a way to fetch that missing part of the embedded qwe website into the same html string, same as happening with IE/Chrome debugger 有没有办法将嵌入式qwe网站的缺失部分提取到相同的html字符串中,就像使用IE / Chrome调试器发生的一样

you can try using c# CEF wrapper https://cefsharp.github.io/ to actually load the whole page and retrieve the code. 您可以尝试使用c#CEF包装器https://cefsharp.github.io/实际加载整个页面并检索代码。 There actually seems to be a subproject for the stuff that you need https://www.nuget.org/packages/CefSharp.OffScreen/ 实际上似乎有一个子项目,您需要的东西https://www.nuget.org/packages/CefSharp.OffScreen/

CefSharp.OffScreen provides a "headless" browser control for automation projects CefSharp.OffScreen为自动化项目提供“无头”浏览器控件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM