簡體   English   中英

c#WebBrowser-如何在文檔加載完成后等待javascript完成運行?

[英]c# WebBrowser- How can I wait for javascript to finish running that runs when the document has finished loading?

我正在開展一個項目,涉及從供應商的網站上抓取一些產品數據(有他們的祝福,但不是他們的幫助)。 我在C#商店工作,所以我使用.NET Windows Forms WebBrowser控件。

我正在響應文檔已完成的事件,但我發現我必須稍微調試一下,否則數據不會顯示在我期望它在DOM中的位置。

在查看頁面上的javascript時,我可以看到它在頁面加載完成后動態地改變現有的DOM內容(設置someDomElement.innerHTML)。 它沒有進行任何ajax調用,它使用的是原始頁面加載中已有的數據。 (我可以嘗試解析該數據,但它嵌入在javascript中並且有點混淆。)顯然,我以某種方式獲取文檔已完成事件在javascript運行完畢之前。

最終可能會有很多頁面要刮掉,所以等待半秒鍾或其他什么東西真的遠遠不夠理想。 我想只等到所有在文檔就緒/頁面加載時啟動的JavaScript在我檢查頁面之前完成運行。 有誰知道這樣做的方法?

我想文件完成事件不應該在那之前開火,對嗎? 但它肯定是。 也許某個頁面javascript正在使用setTimeout。 有沒有辦法判斷是否有待處理的超時?

謝謝你的幫助!

你可以

  1. 假設數據的解析永遠不會改變,請查看Javascript如何處理數據並在您的端點執行相同操作以在頁面加載時立即檢索數據
  2. 將javascript注入網頁並檢測DOM修改以了解何時從C#獲取數據
  3. 用PhantomJS編寫一個純粹的JavaScript解決方案

對於后代/其他任何看過這個問題的人來說,我最終做的是創建一個函數,它等待某些特定事物的某個指定的超時時間(匹配給定的一組標准)顯示在頁面上,然后返回HtmlElement無論是什么。 它定期檢查瀏覽器dom,尋找特定的東西。 它旨在由在后台線程中運行的刮刀工作者調用; 它每次檢查時都使用一個調用來訪問瀏覽器dom。

    /// <summary>
    /// Waits for a tag that matches a given criteria to show up on the page.
    /// 
    /// Note: This function returns a browser DOM element from the foreground thread, and this scraper is running in a background thread,
    /// so use an invoke [ scraperForm.Browser.Invoke(new Action(()=>{ ... })); ] when doing anything with the returned DOM element.
    /// </summary>
    /// <param name="tagName">The type of tag, or "" if all tags are to be searched.</param>
    /// <param name="id">The id of the tag, or "" if the search is not to be by id.</param>
    /// <paran name="className">The class name of the tag, or "" if the search is not to be by class name.</paran>
    /// <param name="keyContent">A string to search the tag's innerText for.</param>
    /// <returns>The first tag to match the criteria, or null if such a tag was not found after the timeout period.</returns>
    public HtmlElement WaitForTag(string tagName, string id, string className, string keyContent, int timeout) {
        Log(string.Format("WaitForTag('{0}','{1}','{2}','{3}',{4}) --", tagName, id, className, keyContent, timeout));
        HtmlElement result = null;
        int timeleft = timeout;
        while (timeleft > 0) {
            //Log("time left: " + timeleft);
            // Access the DOM in the foreground thread using an Invoke call.
            // (required by the WebBrowser control, otherwise cryptic errors result, like "invalid cast")
            scraperForm.Browser.Invoke(new Action(() => {
                HtmlDocument doc = scraperForm.CurrentDocument;
                if (id == "") {
                    //Log("no id supplied..");
                    // no id was supplied, so get tags by tag name if a tag name was supplied, or get all the tags
                    HtmlElementCollection elements = (tagName == "") ? doc.All : doc.GetElementsByTagName(tagName);
                    //Log("found " + elements.Count + " '" + tagName + "' tags");
                    // find the tag that matches the class name (if given) and contains the given content (if any)
                    foreach (HtmlElement element in elements) {
                        if (element == null) continue;
                        if (className != "" && !TagHasClass(element, className)) {
                            //Log(string.Format("looking for className {0}, found {1}", className, element.GetAttribute("className")));
                            continue;
                        }
                        if (keyContent == "" || 
                            (element.InnerText != null && element.InnerText.Contains(keyContent)) ||
                            (tagName == "input" && element.GetAttribute("value").Contains(keyContent)) ||
                            (tagName == "img" && element.GetAttribute("src").Contains(keyContent)) || 
                            (element.OuterHtml.Contains(keyContent)))
                        {
                            result = element;
                        }
                        else if (keyContent != "") {
                            //Log(string.Format("searching for key content '{0}' - found '{1}'", keyContent, element.InnerText));
                        }
                    }
                }
                else {
                    //Log(string.Format("searching for tag by id '{0}'", id));
                    // an id was supplied, so get the tag by id 
                    // Log("looking for element with id [" + id + "]");
                    HtmlElement element = doc.GetElementById(id);
                    // make sure it matches any given class name and contains any given content
                    if (
                        element != null 
                        && 
                        (className == "" || TagHasClass(element, className))
                        && 
                        (keyContent == "" || 
                            (element.InnerText != null && element.InnerText.Contains(keyContent))
                        )
                    ) {
                        // Log("  found it");
                        result = element;
                    }
                    else {
                        // Log("  didn't find it");
                    }
                }
            }));
            if (result != null) break;   // the searched for tag appeared, break out of the loop 
            Thread.Sleep(200);           // wait for more milliseconds and continue looping 
            // Note: Make sure sleeps like this are outside of invokes to the foreground thread, so they only pause this background thread.
            timeleft -= 200;
        }
        return result;
    }

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM