简体   繁体   English

如何在 Scrapy 中使用多个相同格式的 URL 开始抓取

[英]How do I start crawling with multiple URLs of the same format in Scrapy

My Scrapy spider needs to start with URLs of the following format:我的 Scrapy 蜘蛛需要以以下格式的 URL 开头:

https://catalog.loc.gov/vwebv/search?searchArg={$variable}&searchCode=GKEY%5E*&searchType=1&limitTo=none&fromYear=&toYear=&limitTo=LOCA%3Dall&limitTo=PLAC%3Dall&limitTo=TYPE%3Dall&limitTo=LANG%3Dall&recCount=1000'

where $variable is a parameter that can be fed with as many values as possible (possibly even 1000 possible values).其中 $variable 是一个参数,可以输入尽可能多的值(甚至可能有 1000 个可能的值)。

How do I implement this?我该如何实施?

You could overwrite the start_requests method to something like:您可以将start_requests方法覆盖为以下内容:

def start_requests(self):
    base_url = 'https://catalog.loc.gov/vwebv/search?...'
    variables = [...]
    for variable in variables:
        url = base_url.format(variable)
        yield Request(url)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM