简体   繁体   English

将 Webbrowser 与 Control.invoke 结合使用

[英]Using Webbrowser with Control.invoke

I am developing a windows application for web scraping.我正在开发一个用于网页抓取的 Windows 应用程序。 To do this, I use the Webbrowser control - I can't use the the webrequest/webclient/webresponse classes because the web pages are loaded dynamically using javascript.为此,我使用了 Webbrowser 控件 - 我不能使用 webrequest/webclient/webresponse 类,因为网页是使用 javascript 动态加载的。
The application works fine, but since I do a lot of processing, it loads the UI unnecessarily.该应用程序运行良好,但由于我进行了大量处理,因此它不必要地加载了 UI。 I get the "not responding" message intermittently.我间歇性地收到“无响应”消息。 So what I did is:所以我所做的是:

1. Create the webbrowser on the UI thread 1.在UI线程上创建webbrowser
2. Put the long-running processes on a background thread 2. 将长时间运行的进程放在后台线程上
3. Whenever I need to get the page' document I use a Control.Invoke. 3. 每当我需要获取页面文档时,我都会使用 Control.Invoke。
4. Return the page's document via the invoke call to the background thread 4.通过后台线程的invoke调用返回页面的文档

In the callback function, I can see that the page's document is extracted fine.在回调函数中,我可以看到页面的文档被提取得很好。 However, the document (HtmlDocument) returned to background worker is not correctly evaluated.但是,返回给后台工作人员的文档 (HtmlDocument) 未正确评估。 When I step through the debugger, I get "Function evaluation timed out message...".当我单步调试调试器时,我收到“函数评估超时消息...”。 I've played around with the syntax and keep getting invalid cast exception or cross threading messaging exception.我已经尝试过语法并不断收到无效的强制转换异常或跨线程消息传递异常。
Below is how I've coded the callback/ delegate:下面是我如何编码回调/委托:

private delegate HtmlDocument RefreshDelegate(); 
private HtmlDocument RefreshBrowser()
    {
        WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get webbrowser, "br1"
        br1.Refresh(); //refresh browser
        return br1.Document; //is retrieved correctly
   }


Now for the code in the background worker that processes the "returned" HTMLDocument:现在对于处理“返回”HTMLDocument 的后台工作程序中的代码:

WebBrowser br1 = ((WebBrowser)this.Controls["br1"]); //get the browser
HtmlDocument document = (HtmlDocument)br1.Invoke(new RefreshDelegate(this.RefreshBrowser));  //not evaluated 
//do stuff with document


Debugger message encountered: "Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation."遇到调试器消息: “函数求值已禁用,因为先前的函数求值超时。您必须继续执行以重新启用函数求值。” . . Is this the correct way to solve this problem?这是解决这个问题的正确方法吗? As I said I can't get the javascript content with webrequest etc, I also can't run the htmldocument parsing on the UI, because it results in a poor user experience.正如我所说,我无法通过 webrequest 等获取 javascript 内容,我也无法在 UI 上运行 htmldocument 解析,因为它会导致糟糕的用户体验。 Additionally, it happens that i need to create several webbrowser instances.此外,碰巧我需要创建几个 webbrowser 实例。 If this is not the best way, I'm open to other libraries as well.如果这不是最好的方法,我也对其他图书馆持开放态度。 Thanks.谢谢。

This happens because the WebBrowser methods you call in the worker thread or the debugger thread don't actually run on that thread.发生这种情况是因为您在工作线程或调试器线程中调用的 WebBrowser 方法实际上并未在该线程上运行。 WebBrowser is an apartment threaded COM component, COM automatically marshals calls from the worker back to the UI thread. WebBrowser 是一个单元线程 COM 组件,COM 自动将来自工作线程的调用编组回 UI 线程。 This doesn't work well in the debugger because the UI thread is frozen by the debugger.这在调试器中效果不佳,因为调试器冻结了 UI 线程。

Nothing you can do about that, actually having these calls run on the UI thread still leaves you open to UI freezes.你对此无能为力,实际上让这些调用在 UI 线程上运行仍然会让你对 UI 冻结持开放态度。 The only cure for that is the run the browser completely off on its own STA thread.唯一的解决方法是在自己的 STA 线程上完全关闭浏览器。 You can't look at it, shouldn't be an issue I imagine.你不能看它,应该不是我想象的问题。 Check this answer for the code you'll need.检查此答案以获取您需要的代码。

I would suggest using the HtmlAgilityPack.我建议使用 HtmlAgilityPack。 This is specifically designed for web "scraping".这是专为网络“抓取”而设计的。

http://htmlagilitypack.codeplex.com/ http://htmlagilitypack.codeplex.com/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM