简体   繁体   English

浏览器实例是否从机械化缓存?

[英]Does the browser instance from mechanize cache?

I am doing some webscraping with the mechanize browser and using the following code. 我正在使用机械化浏览器进行一些网络爬虫,并使用以下代码。 I realized in some cases we keep getting the same page, although the remote page is already changed. 我意识到在某些情况下,尽管远程页面已经更改,但我们仍会获得相同的页面。 So my question is: 所以我的问题是:

  1. Does mechanize broswer instance cache page by default (in some config)? 默认情况下(在某些配置中)机械化浏览器实例缓存页面吗?
  2. If so, how can we change it, or is there a way to avoid caching (apart from creating the browser instance every time in the loop we webscrape) 如果是这样,我们如何更改它,或者有一种避免缓存的方法(除了每次在Webscrape循环中创建浏览器实例之外)

     # put int login detail and submit, return a mechanize.Browser instance browser = _login() # main loop while True: rsp = browser.open(URL) html = rsp.read() 

thanks 谢谢

According to this thread , 根据这个线程

Mechanize instances do cache pages you've visited, but you can clear that with agent.history.clear; 机械化实例确实会缓存您访问过的页面,但是您可以使用agent.history.clear;清除它。 or prevent history from being saved by setting agent.history.max_size = 0. Or, you can use a new Mechanize instance altogether. 或通过设置agent.history.max_size = 0阻止保存历史记录。或者,您可以完全使用新的Mechanize实例。

Particularly, 尤其,

Currently Mechanize reuses pages in the history of the session if a request with an If-Modified-Since header results in 304 Not Modified. 当前,如果带有If-Modified-Since标头的请求导致304 Not Modified,则Mechanize重用会话历史记录中的页面。

And by the documentation here , in Python, the following code will prevent the caching-like behavior (seekable responses): 并且通过此处的文档(在Python中),以下代码将防止类似缓存的行为(可寻求的响应):

import mechanize
ua = mechanize.UserAgent()
ua.set_seekable_responses(False)
ua.set_handle_equiv(False)
ua.set_debug_responses(False)

Hope that provides some insight. 希望能提供一些见识。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM