页面加载完成后c＃下载html字符串

Question

I am trying to use a loop to download a bunch of html pages and scrap inside data. 我正在尝试使用循环下载一堆html页面并在内部数据中剪贴。 But those pages have some javascript job runing when loading. 但是这些页面在加载时会运行一些javascript作业。 So I am thinking using webclient may not be a good choice. 因此，我认为使用webclient可能不是一个好选择。 But if I use webBrowser like below. 但是，如果我使用如下所示的webBrowser。 it return empty html string after first call in the loop. 它在循环中的第一次调用后返回空的html字符串。

WebBrowser wb = new WebBrowser();
        wb.ScrollBarsEnabled = false;
        wb.ScriptErrorsSuppressed = true;
        wb.Navigate(url);
        while (wb.ReadyState != WebBrowserReadyState.Complete) { Application.DoEvents(); Thread.Sleep(1000); }
        html = wb.Document.DomDocument.ToString();

Answer 1

Your are correct that WebClient & all of the other HTTP client interfaces will completely ignore JavaScript; 您是正确的，WebClient和所有其他HTTP客户端接口将完全忽略JavaScript； none of them are Browsers after all. 毕竟它们都不是浏览器。

You want: 你要：

var html = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;

Note that if you load via a WebBrowser you don't need to scrape the raw markup; 请注意，如果您通过WebBrowser加载，则无需抓取原始标记。 you can use DOM methods like GetElementById/TagName and so on. 您可以使用DOM方法，例如GetElementById/TagName等。

The while loop is very VBScript, there is a DocumentCompleted event you should wire your code into. while循环是非常VBScript，有一个DocumentCompleted事件，您应该将代码连接到其中。

private void Whatever()
{
    WebBrowser wb = new WebBrowser();
    wb.DocumentCompleted += Wb_DocumentCompleted;

    wb.ScriptErrorsSuppressed = true;
    wb.Navigate("http://stackoverflow.com");
}

private void Wb_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    var wb = (WebBrowser)sender;

    var html = wb.Document.GetElementsByTagName("HTML")[0].OuterHtml;
    var domd = wb.Document.GetElementById("copyright").InnerText;
    /* ... */
}

页面加载完成后c＃下载html字符串

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-02-12 14:37:24

页面加载完成后c＃下载html字符串

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-02-12 14:37:24

解决方案1
3 已采纳 2016-02-12 14:37:24