简体   繁体   English

爬网程序的VB.NET WebBrowser控件的替代

[英]Alternative to VB.NET WebBrowser Control for Crawler

I'm volunteering for an amazing site that has pretty limited resources that needs some help with a bottleneck in their site crawler. 我正在为一个令人惊叹的网站提供志愿服务,该网站的资源非常有限,需要一些帮助来解决其网站爬网程序的瓶颈。 It's written in VB.NET using WebBrowser Control and crawls a single site scraping data (with the knowledge and permission of said site). 它是使用WebBrowser Control用VB.NET编写的,并且在单个站点抓取数据(在所述站点的知识和许可下)之间进行爬网。 I believe I've found a possible solution to the bottleneck by accessing HTTPOnly cookies with this technique . 我相信通过使用此技术访问HTTPOnly cookie可以找到解决瓶颈的方法 However, I'm wondering if there's a more efficient alternative to the WebBrowser Control that could possibly do the job and still access HTTPOnly cookies? 但是,我想知道是否有比WebBrowser控件更有效的替代方法,它可以完成这项工作并仍然访问HTTPOnly cookie?

The core requirements are: 核心要求是:

  • Ability to send/receive session info (login is required to access data) 能够发送/接收会话信息(需要登录才能访问数据)
  • Access HTTPOnly cookies 访问HTTPOnly cookie
  • capture HTML and XHR responses only (JS/images/css/etc can't be downloaded as that at least triples the average response time for the HTML) 仅捕获HTML和XHR响应(无法下载JS / images / css / etc,因为它至少使HTML的平均响应时间翻了三倍)

Check out the System.Net assembly: 签出System.Net程序集:

http://msdn.microsoft.com/en-us/library/ms172307.aspx http://msdn.microsoft.com/en-us/library/ms172307.aspx

It should cover all your use-cases. 它应该涵盖您所有的用例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM