简体   繁体   English

C# Web 抓取:读取动态加载 ajax 内容由 Z2567A5EC9705EB71AC2C984033E 浏览器控制

[英]C# Web Scraping: Reading dynamically load ajax content by web browser control

This is how i tried.这就是我尝试的方式。 i load a web site into web browser control.我将 web 站点加载到 web 浏览器控件中。 the web site load more data when user scroll down.当用户向下滚动时,web 站点会加载更多数据。

This web site load data dynamically by ajax.此 web 站点通过 ajax 动态加载数据。 i try to read all dynamic H3 tag loaded by ajax but my code did not work.我尝试读取 ajax 加载的所有动态H3 标签,但我的代码不起作用。 not able to understand what i am missing in my code.无法理解我的代码中缺少什么。

here is my code这是我的代码

private void BrowserTest_Load(object sender, EventArgs e)
{
    webBrowser1.Navigate("https://www.pinterest.com/pin/517210338432366716/");
}

private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
{
    if (webBrowser1.ReadyState == WebBrowserReadyState.Complete)
    {
        HtmlElement elm = webBrowser1.Document.GetElementById("h3"); // Get "abc" element by ID
        //Console.WriteLine("elm.InnerHtml(DocumentCompleted):" + elm.InnerHtml);
        if (elm != null)
        {
            elm.AttachEventHandler("onpropertychange", new EventHandler(handler));
        }
    }
}

private void handler(Object sender, EventArgs e)
{
    HtmlElement div = webBrowser1.Document.GetElementById("h3");
    if (div == null) return;
    String contentLoaded = div.InnerHtml;
}

private void btnScrollDown_Click(object sender, EventArgs e)
{
    if (webBrowser1.Document != null)
    {
        webBrowser1.Document.Window.ScrollTo(0, webBrowser1.Document.Body.ScrollRectangle.Height);
    }
}

Looking for suggestion how to achieve my goal.寻找建议如何实现我的目标。 thanks谢谢

I would choose a more different way for this;我会为此选择一种更不同的方式;

  1. scroll document to bottom将文档滚动到底部

  2. wait 100ms (or 200ms, 500ms, your choice..)等待 100 毫秒(或 200 毫秒、500 毫秒,您的选择..)

  3. count total loaded grid elements in document计算文档中加载的网格元素总数

  4. repeat this from step 1;从第 1 步开始重复此操作; until;直到; if loaded grid elements count does not change for last 5 seconds.如果加载的网格元素计数在最后 5 秒内没有变化。 in that case it is probably end of all items, so get all grid elements in the document.在这种情况下,它可能是所有项目的结尾,因此请获取文档中的所有网格元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM