[英]C# - Get a render html page
i am trying to get html string from my site as it presented in browser 我正在尝试从浏览器中显示的网站获取html字符串
firstly i tried to use web client 首先,我尝试使用网络客户端
using (var client = new WebClient())
{
var content = client.DownloadString("my_site_address");
}
but in my site i have some javascript code that change the view (and webClient does not run javascript) 但是在我的网站上,我有一些更改视图的javascript代码(并且webClient不运行javascript)
so i use wpf WebBrowser and after nevigate to the desire site it show the page (as expected) but when i try to get the html string it show just like the webClient 所以我使用wpf WebBrowser,导航到所需站点后,它显示页面(按预期方式),但是当我尝试获取html字符串时,它像webClient一样显示
dynamic doc = MainBrowser.Document;
var htmlText = doc.documentElement.InnerHtml;
this is how i get the html: 这是我如何获取HTML:
<!DOCTYPE html>
<head>
<title>Title</title>
</head>
<body>
<div class="conteiner">
<div class="matrix">
<script type="text/javascript">
// some script code
</script>
<script type="text/javascript" src="xxx"></script>
<a href="Matrix/index.html">Matrix</a>
</div>
<div class="zoom">
<a href="zoom/index.html">Zoom</a>
</div>
</div>
<div class="test">
<script type="text/javascript">
// some script code
</script>
<script type="text/javascript" src"xxx2"></script>
</div>
</body>
</html>
and this is how i should get it after the javascript change it: 这是我应该如何在javascript更改它后得到它:
<html><head>
<title>Title</title>
</head>
<body>
<div class="conteiner">
<div class="matrix">
<script type="text/javascript">
</script>
<script type="text/javascript" src="xxx"></script><iframe ></iframe><script ></script><div ><div ><iframe >
<html><head>
<title></title>
</head>
<body>
<div >
<ul><li><ol><li <a </a></li></ol></li></ul> </div>
</body></html>
</iframe></div></div></div>
<a href="Matrix/index.html">Matrix </a>
</div>
<div class="zoom">
<a href="zoom/index.html">Zoom</a>
</div>
</div>
<div class="test">
<script type="text/javascript">
</script>
<script type="text/javascript" src="xxx2"></script><div ><div ><div ><iframe ></iframe></div></div></div>
</div>
</body></html>
Please help :) 请帮忙 :)
You can use the WebDriver framework from Selenium . 您可以使用Selenium的WebDriver框架。 It offers different web driver implementations, like for Internet Explorer or Firefox. 它提供了不同的Web驱动程序实现,例如Internet Explorer或Firefox。
Here is some sample code to request a web site with Internet Explorer, let it render and finally save the final HTML markup. 这是一些示例代码,用于请求使用Internet Explorer的网站,使其呈现并最终保存最终的HTML标记。
public class WebSiteHtmlLoader : IDisposable
{
private readonly RemoteWebDriver _remoteWebDriver;
public WebSiteHtmlLoader(RemoteWebDriver remoteWebDriver)
{
if (remoteWebDriver == null) throw new ArgumentNullException("remoteWebDriver");
_remoteWebDriver = remoteWebDriver;
}
public string GetRenderedHtml(Uri webSiteUri)
{
if (webSiteUri == null) throw new ArgumentNullException("webSiteUri");
_remoteWebDriver.Navigate().GoToUrl(webSiteUri);
return _remoteWebDriver.PageSource;
}
public void Dispose()
{
Dispose(true);
GC.SuppressFinalize(this);
}
private void Dispose(bool disposing)
{
if (disposing)
{
if (_remoteWebDriver != null)
{
_remoteWebDriver.Quit();
}
}
}
}
Usage: 用法:
class Program
{
static void Main(string[] args)
{
if (!args.Any())
{
return;
}
var pageUrl = args.First();
var options = new InternetExplorerOptions
{
IntroduceInstabilityByIgnoringProtectedModeSettings = true,
PageLoadStrategy = InternetExplorerPageLoadStrategy.Eager
};
using (var htmlLoader = new WebSiteHtmlLoader(new InternetExplorerDriver(options)))
{
var html = htmlLoader.GetRenderedHtml(new Uri(pageUrl, UriKind.Absolute));
File.WriteAllText(@"C:\htmlloadertext.html", html);
}
}
}
You could try to use the WebBrowser.DocumentText property. 您可以尝试使用WebBrowser.DocumentText属性。 Like, add a hidden WebBrowser control to your application and call Navigate() function, then call the property to get the generated HTML More info at: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.documenttext.aspx 像这样,将隐藏的WebBrowser控件添加到您的应用程序并调用Navigate()函数,然后调用该属性以获取生成的HTML更多信息, 网址为: http : //msdn.microsoft.com/zh-cn/library/system.windows。 Forms.webbrowser.documenttext.aspx
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.