[英]How to push post parameter into scrapy-redis
I have a post request like 我有类似的发帖请求
def start_requests(self):
yield FormRequest(url,formdata={'id': "parameter from redis"})
Can I use redis-cli lpush to save post parameter and that my crawler run it? 我可以使用redis-cli lpush来保存post参数,而我的搜寻器可以运行它吗?
By default the scrapy-redis queue working only with url as messages. 默认情况下,scrapy-redis队列仅将url作为消息使用。 One message = one url. 一则讯息=一则网址。 But you can modify this behavior. 但是您可以修改此行为。 For example you can use some object for your messages/requests: 例如,您可以将某些对象用于您的消息/请求:
class ScheduledRequest:
def __init__(self, url, method, body)
self.url = url
self.method = method
self.body = body
Pass it to queue as json encoded dic: 将其作为json编码的dic传递到队列:
redis.lpush(
queue_key,
json.dumps(
ScheduledRequest(
url='http://google.com',
method='POST',
body='some body data ...'
).__dict__
)
)
And rewrite the make_request_from_data and schedule_next_requests methods: 并重写make_request_from_data和schedule_next_requests方法:
class MySpiderBase(RedisCrawlSpider, scrapy.Spider):
def __init__(self, *args, **kwargs):
super(MySpiderBase, self).__init__(*args, **kwargs)
def make_request_from_data(self, data):
scheduled = ScheduledRequest(
**json.loads(
bytes_to_str(data, self.redis_encoding)
)
)
# here you can use and FormRequest
return scrapy.Request(url=scheduled.url, method=scheduled.method, body=scheduled.body)
def schedule_next_requests(self):
for request in self.next_requests():
self.crawler.engine.crawl(request, spider=self)
def parse(self, response):
pass
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.