简体   繁体   中英

C# Web Scraping: Reading dynamically load ajax content by web browser control

This is how i tried. i load a web site into web browser control. the web site load more data when user scroll down.

This web site load data dynamically by ajax. i try to read all dynamic H3 tag loaded by ajax but my code did not work. not able to understand what i am missing in my code.

here is my code

private void BrowserTest_Load(object sender, EventArgs e)
{
    webBrowser1.Navigate("https://www.pinterest.com/pin/517210338432366716/");
}

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
    {
        HtmlElement elm = webBrowser1.Document.GetElementById("h3"); // Get "abc" element by ID
        //Console.WriteLine("elm.InnerHtml(DocumentCompleted):" + elm.InnerHtml);
        if (elm != null)
        {
            elm.AttachEventHandler("onpropertychange", new EventHandler(handler));
        }
    }
}

private void handler(Object sender, EventArgs e)
{
    HtmlElement div = webBrowser1.Document.GetElementById("h3");
    if (div == null) return;
    String contentLoaded = div.InnerHtml;
}

private void btnScrollDown_Click(object sender, EventArgs e)
{
    if (webBrowser1.Document != null)
    {
        webBrowser1.Document.Window.ScrollTo(0, webBrowser1.Document.Body.ScrollRectangle.Height);
    }
}

Looking for suggestion how to achieve my goal. thanks

I would choose a more different way for this;

  1. scroll document to bottom

  2. wait 100ms (or 200ms, 500ms, your choice..)

  3. count total loaded grid elements in document

  4. repeat this from step 1; until; if loaded grid elements count does not change for last 5 seconds. in that case it is probably end of all items, so get all grid elements in the document.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM