简体   繁体   English

C#-获取渲染HTML页面

[英]C# - Get a render html page

i am trying to get html string from my site as it presented in browser 我正在尝试从浏览器中显示的网站获取html字符串

firstly i tried to use web client 首先,我尝试使用网络客户端

using (var client = new WebClient())
{
    var content = client.DownloadString("my_site_address");
}

but in my site i have some javascript code that change the view (and webClient does not run javascript) 但是在我的网站上,我有一些更改视图的javascript代码(并且webClient不运行javascript)

so i use wpf WebBrowser and after nevigate to the desire site it show the page (as expected) but when i try to get the html string it show just like the webClient 所以我使用wpf WebBrowser,导航到所需站点后,它显示页面(按预期方式),但是当我尝试获取html字符串时,它像webClient一样显示

        dynamic doc = MainBrowser.Document;
        var htmlText = doc.documentElement.InnerHtml;

this is how i get the html: 这是我如何获取HTML:

<!DOCTYPE html>
  <head>
    <title>Title</title>
  </head>
  <body>
    <div class="conteiner">
        <div class="matrix">
            <script type="text/javascript"> 
                // some script code
            </script>
            <script type="text/javascript" src="xxx"></script>
            <a href="Matrix/index.html">Matrix</a>
        </div>
        <div class="zoom">
            <a href="zoom/index.html">Zoom</a>
        </div>
    </div>
        <div class="test">
            <script type="text/javascript"> 
                // some script code
            </script>
            <script type="text/javascript" src"xxx2"></script>
        </div>
  </body>
</html>

and this is how i should get it after the javascript change it: 这是我应该如何在javascript更改它后得到它:

<html><head>
    <title>Title</title>
</head>
  <body>
  <div class="conteiner">
        <div class="matrix">
        <script type="text/javascript"> 
</script>
 <script type="text/javascript" src="xxx"></script><iframe ></iframe><script ></script><div ><div ><iframe >

<html><head>
        <title></title>
</head>
        <body>
            <div >
            <ul><li><ol><li <a </a></li></ol></li></ul>        </div>

</body></html>

 </iframe></div></div></div>
            <a href="Matrix/index.html">Matrix </a>
        </div>
        <div class="zoom">
            <a href="zoom/index.html">Zoom</a>
        </div>
        </div>
        <div class="test">
            <script type="text/javascript"> 

</script>
 <script type="text/javascript" src="xxx2"></script><div ><div ><div ><iframe ></iframe></div></div></div>
        </div>

</body></html>

Please help :) 请帮忙 :)

You can use the WebDriver framework from Selenium . 您可以使用SeleniumWebDriver框架。 It offers different web driver implementations, like for Internet Explorer or Firefox. 它提供了不同的Web驱动程序实现,例如Internet Explorer或Firefox。

Here is some sample code to request a web site with Internet Explorer, let it render and finally save the final HTML markup. 这是一些示例代码,用于请求使用Internet Explorer的网站,使其呈现并最终保存最终的HTML标记。

public class WebSiteHtmlLoader : IDisposable
{
    private readonly RemoteWebDriver _remoteWebDriver;

    public WebSiteHtmlLoader(RemoteWebDriver remoteWebDriver)
    {
        if (remoteWebDriver == null) throw new ArgumentNullException("remoteWebDriver");
        _remoteWebDriver = remoteWebDriver;
    }

    public string GetRenderedHtml(Uri webSiteUri)
    {
        if (webSiteUri == null) throw new ArgumentNullException("webSiteUri");
        _remoteWebDriver.Navigate().GoToUrl(webSiteUri);

        return _remoteWebDriver.PageSource;
    }

    public void Dispose()
    {
        Dispose(true);
        GC.SuppressFinalize(this);
    }

    private void Dispose(bool disposing)
    {
        if (disposing)
        {
            if (_remoteWebDriver != null)
            {
                _remoteWebDriver.Quit();
            }
        }
    }
}

Usage: 用法:

class Program
{
    static void Main(string[] args)
    {
        if (!args.Any())
        {
            return;
        }

        var pageUrl = args.First();
        var options = new InternetExplorerOptions
        {
            IntroduceInstabilityByIgnoringProtectedModeSettings = true,
            PageLoadStrategy = InternetExplorerPageLoadStrategy.Eager
        };

        using (var htmlLoader = new WebSiteHtmlLoader(new InternetExplorerDriver(options)))
        {
            var html = htmlLoader.GetRenderedHtml(new Uri(pageUrl, UriKind.Absolute));
            File.WriteAllText(@"C:\htmlloadertext.html", html);
        }
    }
}

You could try to use the WebBrowser.DocumentText property. 您可以尝试使用WebBrowser.DocumentText属性。 Like, add a hidden WebBrowser control to your application and call Navigate() function, then call the property to get the generated HTML More info at: http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.documenttext.aspx 像这样,将隐藏的WebBrowser控件添加到您的应用程序并调用Navigate()函数,然后调用该属性以获取生成的HTML更多信息, 网址为: http : //msdn.microsoft.com/zh-cn/library/system.windows。 Forms.webbrowser.documenttext.aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM