简体   繁体   中英

Using Webbrowser with Control.invoke

I am developing a windows application for web scraping. To do this, I use the Webbrowser control - I can't use the the webrequest/webclient/webresponse classes because the web pages are loaded dynamically using javascript.
The application works fine, but since I do a lot of processing, it loads the UI unnecessarily. I get the "not responding" message intermittently. So what I did is:

1. Create the webbrowser on the UI thread
2. Put the long-running processes on a background thread
3. Whenever I need to get the page' document I use a Control.Invoke.
4. Return the page's document via the invoke call to the background thread

In the callback function, I can see that the page's document is extracted fine. However, the document (HtmlDocument) returned to background worker is not correctly evaluated. When I step through the debugger, I get "Function evaluation timed out message...". I've played around with the syntax and keep getting invalid cast exception or cross threading messaging exception.
Below is how I've coded the callback/ delegate:

private delegate HtmlDocument RefreshDelegate(); 
private HtmlDocument RefreshBrowser()
    {
        WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get webbrowser, "br1"
        br1.Refresh(); //refresh browser
        return br1.Document; //is retrieved correctly
   }


Now for the code in the background worker that processes the "returned" HTMLDocument:

WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get the browser
HtmlDocument document = (HtmlDocument)br1.Invoke(new RefreshDelegate(this.RefreshBrowser));  //not evaluated 
//do stuff with document


Debugger message encountered: "Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation." . Is this the correct way to solve this problem? As I said I can't get the javascript content with webrequest etc, I also can't run the htmldocument parsing on the UI, because it results in a poor user experience. Additionally, it happens that i need to create several webbrowser instances. If this is not the best way, I'm open to other libraries as well. Thanks.

This happens because the WebBrowser methods you call in the worker thread or the debugger thread don't actually run on that thread. WebBrowser is an apartment threaded COM component, COM automatically marshals calls from the worker back to the UI thread. This doesn't work well in the debugger because the UI thread is frozen by the debugger.

Nothing you can do about that, actually having these calls run on the UI thread still leaves you open to UI freezes. The only cure for that is the run the browser completely off on its own STA thread. You can't look at it, shouldn't be an issue I imagine. Check this answer for the code you'll need.

I would suggest using the HtmlAgilityPack. This is specifically designed for web "scraping".

http://htmlagilitypack.codeplex.com/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM