[英]How to cache Only http status 200 in scrapy?
I am using scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware
to cache scrapy requests.我正在使用scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware
来缓存scrapy 请求。 I'd like it to only cache if status is 200. Is that the default behavior?我希望它只在状态为 200 时缓存。这是默认行为吗? Or do I need to specify HTTPCACHE_IGNORE_HTTP_CODES
to be everything except 200?或者我是否需要将HTTPCACHE_IGNORE_HTTP_CODES
指定为除200之外的所有内容?
Yes, by default HttpCacheMiddleware
run a DummyPolicy
for the requests.是的,默认情况下HttpCacheMiddleware
为请求运行DummyPolicy
。 It pretty much does nothing special on it's own so you need to set HTTPCACHE_IGNORE_HTTP_CODES
to everything except 200.它本身几乎没有什么特别之处,因此您需要将HTTPCACHE_IGNORE_HTTP_CODES
设置为除 200 之外的所有内容。
Here's the source for the DummyPolicy And these are the lines that actually matter: 这是 DummyPolicy 的源代码这些是真正重要的行:
class DummyPolicy(object):
def __init__(self, settings):
self.ignore_http_codes = [int(x) for x in settings.getlist('HTTPCACHE_IGNORE_HTTP_CODES')]
def should_cache_response(self, response, request):
return response.status not in self.ignore_http_codes
So in reality you can also extend this and override should_cache_response()
to something that would check for 200
explicitly, ie return response.status == 200
and then set it as your cache policy via HTTPCACHE_POLICY
setting .所以实际上你也可以扩展这个并将should_cache_response()
覆盖到可以明确检查200
东西,即return response.status == 200
然后通过HTTPCACHE_POLICY
setting将其设置为你的缓存策略。
The answer is no, you do not need to do that.答案是否定的,您不需要这样做。 You should write a CachePolicy and update settings.py to enable your policy I put the CachePolicy class in the middlewares.py您应该编写一个 CachePolicy 并更新 settings.py 以启用您的策略我将 CachePolicy 类放在 middlewares.py
from scrapy.extensions.httpcache import DummyPolicy
class CachePolicy(DummyPolicy):
def should_cache_response(self, response, request):
return response.status == 200
and then update the settings.py, append the following line然后更新settings.py,追加以下行
HTTPCACHE_POLICY = 'yourproject.middlewares.CachePolicy'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.