简体   繁体   English

如何在scrapy中缓存仅http状态200?

[英]How to cache Only http status 200 in scrapy?

I am using scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware to cache scrapy requests.我正在使用scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware来缓存scrapy 请求。 I'd like it to only cache if status is 200. Is that the default behavior?我希望它只在状态为 200 时缓存。这是默认行为吗? Or do I need to specify HTTPCACHE_IGNORE_HTTP_CODES to be everything except 200?或者我是否需要将HTTPCACHE_IGNORE_HTTP_CODES指定为200之外的所有内容?

Yes, by default HttpCacheMiddleware run a DummyPolicy for the requests.是的,默认情况下HttpCacheMiddleware为请求运行DummyPolicy It pretty much does nothing special on it's own so you need to set HTTPCACHE_IGNORE_HTTP_CODES to everything except 200.它本身几乎没有什么特别之处,因此您需要将HTTPCACHE_IGNORE_HTTP_CODES设置为除 200 之外的所有内容。

Here's the source for the DummyPolicy And these are the lines that actually matter: 这是 DummyPolicy 的源代码这些是真正重要的行:

class DummyPolicy(object):

    def __init__(self, settings):
        self.ignore_http_codes = [int(x) for x in settings.getlist('HTTPCACHE_IGNORE_HTTP_CODES')]

    def should_cache_response(self, response, request):
        return response.status not in self.ignore_http_codes

So in reality you can also extend this and override should_cache_response() to something that would check for 200 explicitly, ie return response.status == 200 and then set it as your cache policy via HTTPCACHE_POLICY setting .所以实际上你也可以扩展这个并将should_cache_response()覆盖到可以明确检查200东西,即return response.status == 200然后通过HTTPCACHE_POLICY setting将其设置为你的缓存策略。

The answer is no, you do not need to do that.答案是否定的,您不需要这样做。 You should write a CachePolicy and update settings.py to enable your policy I put the CachePolicy class in the middlewares.py您应该编写一个 CachePolicy 并更新 settings.py 以启用您的策略我将 CachePolicy 类放在 middlewares.py

from scrapy.extensions.httpcache import DummyPolicy

class CachePolicy(DummyPolicy):
   def should_cache_response(self, response, request):
       return response.status == 200

and then update the settings.py, append the following line然后更新settings.py,追加以下行

HTTPCACHE_POLICY = 'yourproject.middlewares.CachePolicy'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM