[英]How can i browse using webbrowser to many sites and wait untill finished loading each site document?
[英]c# WebBrowser- How can I wait for javascript to finish running that runs when the document has finished loading?
我正在開展一個項目,涉及從供應商的網站上抓取一些產品數據(有他們的祝福,但不是他們的幫助)。 我在C#商店工作,所以我使用.NET Windows Forms WebBrowser控件。
我正在響應文檔已完成的事件,但我發現我必須稍微調試一下,否則數據不會顯示在我期望它在DOM中的位置。
在查看頁面上的javascript時,我可以看到它在頁面加載完成后動態地改變現有的DOM內容(設置someDomElement.innerHTML)。 它沒有進行任何ajax調用,它使用的是原始頁面加載中已有的數據。 (我可以嘗試解析該數據,但它嵌入在javascript中並且有點混淆。)顯然,我以某種方式獲取文檔已完成事件在javascript運行完畢之前。
最終可能會有很多頁面要刮掉,所以等待半秒鍾或其他什么東西真的遠遠不夠理想。 我想只等到所有在文檔就緒/頁面加載時啟動的JavaScript在我檢查頁面之前完成運行。 有誰知道這樣做的方法?
我想文件完成事件不應該在那之前開火,對嗎? 但它肯定是。 也許某個頁面javascript正在使用setTimeout。 有沒有辦法判斷是否有待處理的超時?
謝謝你的幫助!
你可以
對於后代/其他任何看過這個問題的人來說,我最終做的是創建一個函數,它等待某些特定事物的某個指定的超時時間(匹配給定的一組標准)顯示在頁面上,然后返回HtmlElement無論是什么。 它定期檢查瀏覽器dom,尋找特定的東西。 它旨在由在后台線程中運行的刮刀工作者調用; 它每次檢查時都使用一個調用來訪問瀏覽器dom。
/// <summary>
/// Waits for a tag that matches a given criteria to show up on the page.
///
/// Note: This function returns a browser DOM element from the foreground thread, and this scraper is running in a background thread,
/// so use an invoke [ scraperForm.Browser.Invoke(new Action(()=>{ ... })); ] when doing anything with the returned DOM element.
/// </summary>
/// <param name="tagName">The type of tag, or "" if all tags are to be searched.</param>
/// <param name="id">The id of the tag, or "" if the search is not to be by id.</param>
/// <paran name="className">The class name of the tag, or "" if the search is not to be by class name.</paran>
/// <param name="keyContent">A string to search the tag's innerText for.</param>
/// <returns>The first tag to match the criteria, or null if such a tag was not found after the timeout period.</returns>
public HtmlElement WaitForTag(string tagName, string id, string className, string keyContent, int timeout) {
Log(string.Format("WaitForTag('{0}','{1}','{2}','{3}',{4}) --", tagName, id, className, keyContent, timeout));
HtmlElement result = null;
int timeleft = timeout;
while (timeleft > 0) {
//Log("time left: " + timeleft);
// Access the DOM in the foreground thread using an Invoke call.
// (required by the WebBrowser control, otherwise cryptic errors result, like "invalid cast")
scraperForm.Browser.Invoke(new Action(() => {
HtmlDocument doc = scraperForm.CurrentDocument;
if (id == "") {
//Log("no id supplied..");
// no id was supplied, so get tags by tag name if a tag name was supplied, or get all the tags
HtmlElementCollection elements = (tagName == "") ? doc.All : doc.GetElementsByTagName(tagName);
//Log("found " + elements.Count + " '" + tagName + "' tags");
// find the tag that matches the class name (if given) and contains the given content (if any)
foreach (HtmlElement element in elements) {
if (element == null) continue;
if (className != "" && !TagHasClass(element, className)) {
//Log(string.Format("looking for className {0}, found {1}", className, element.GetAttribute("className")));
continue;
}
if (keyContent == "" ||
(element.InnerText != null && element.InnerText.Contains(keyContent)) ||
(tagName == "input" && element.GetAttribute("value").Contains(keyContent)) ||
(tagName == "img" && element.GetAttribute("src").Contains(keyContent)) ||
(element.OuterHtml.Contains(keyContent)))
{
result = element;
}
else if (keyContent != "") {
//Log(string.Format("searching for key content '{0}' - found '{1}'", keyContent, element.InnerText));
}
}
}
else {
//Log(string.Format("searching for tag by id '{0}'", id));
// an id was supplied, so get the tag by id
// Log("looking for element with id [" + id + "]");
HtmlElement element = doc.GetElementById(id);
// make sure it matches any given class name and contains any given content
if (
element != null
&&
(className == "" || TagHasClass(element, className))
&&
(keyContent == "" ||
(element.InnerText != null && element.InnerText.Contains(keyContent))
)
) {
// Log(" found it");
result = element;
}
else {
// Log(" didn't find it");
}
}
}));
if (result != null) break; // the searched for tag appeared, break out of the loop
Thread.Sleep(200); // wait for more milliseconds and continue looping
// Note: Make sure sleeps like this are outside of invokes to the foreground thread, so they only pause this background thread.
timeleft -= 200;
}
return result;
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.