[英]How do I start crawling with multiple URLs of the same format in Scrapy
My Scrapy spider needs to start with URLs of the following format:我的 Scrapy 蜘蛛需要以以下格式的 URL 开头:
https://catalog.loc.gov/vwebv/search?searchArg={$variable}&searchCode=GKEY%5E*&searchType=1&limitTo=none&fromYear=&toYear=&limitTo=LOCA%3Dall&limitTo=PLAC%3Dall&limitTo=TYPE%3Dall&limitTo=LANG%3Dall&recCount=1000'
where $variable is a parameter that can be fed with as many values as possible (possibly even 1000 possible values).其中 $variable 是一个参数,可以输入尽可能多的值(甚至可能有 1000 个可能的值)。
How do I implement this?我该如何实施?
You could overwrite the start_requests
method to something like:您可以将
start_requests
方法覆盖为以下内容:
def start_requests(self):
base_url = 'https://catalog.loc.gov/vwebv/search?...'
variables = [...]
for variable in variables:
url = base_url.format(variable)
yield Request(url)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.