简体   繁体   English

如何将发布参数推送到scrapy-redis

[英]How to push post parameter into scrapy-redis

I have a post request like 我有类似的发帖请求

def start_requests(self):
    yield FormRequest(url,formdata={'id': "parameter from redis"})

Can I use redis-cli lpush to save post parameter and that my crawler run it? 我可以使用redis-cli lpush来保存post参数,而我的搜寻器可以运行它吗?

By default the scrapy-redis queue working only with url as messages. 默认情况下,scrapy-redis队列仅将url作为消息使用。 One message = one url. 一则讯息=一则网址。 But you can modify this behavior. 但是您可以修改此行为。 For example you can use some object for your messages/requests: 例如,您可以将某些对象用于您的消息/请求:

    class ScheduledRequest:
        def __init__(self, url, method, body)
            self.url = url
            self.method = method
            self.body = body

Pass it to queue as json encoded dic: 将其作为json编码的dic传递到队列:

    redis.lpush(
        queue_key,
        json.dumps(
            ScheduledRequest(
               url='http://google.com',
               method='POST',
               body='some body data ...'
            ).__dict__
        )
    )

And rewrite the make_request_from_data and schedule_next_requests methods: 并重写make_request_from_data和schedule_next_requests方法:

class MySpiderBase(RedisCrawlSpider, scrapy.Spider):

    def __init__(self, *args, **kwargs):
        super(MySpiderBase, self).__init__(*args, **kwargs)

    def make_request_from_data(self, data):
        scheduled = ScheduledRequest(
            **json.loads(
                bytes_to_str(data, self.redis_encoding)
            )
        )
        # here you can use and FormRequest
        return scrapy.Request(url=scheduled.url, method=scheduled.method, body=scheduled.body)

   def schedule_next_requests(self):
       for request in self.next_requests():
           self.crawler.engine.crawl(request, spider=self)

   def parse(self, response):
       pass

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM