如何在 Scrapy 中使用多个相同格式的 URL 开始抓取

Question

My Scrapy spider needs to start with URLs of the following format:我的 Scrapy 蜘蛛需要以以下格式的 URL 开头：

https://catalog.loc.gov/vwebv/search?searchArg={$variable}&searchCode=GKEY%5E*&searchType=1&limitTo=none&fromYear=&toYear=&limitTo=LOCA%3Dall&limitTo=PLAC%3Dall&limitTo=TYPE%3Dall&limitTo=LANG%3Dall&recCount=1000'

where $variable is a parameter that can be fed with as many values as possible (possibly even 1000 possible values).其中 $variable 是一个参数，可以输入尽可能多的值（甚至可能有 1000 个可能的值）。

How do I implement this?我该如何实施？

Answer 1

You could overwrite the start_requests method to something like:您可以将start_requests方法覆盖为以下内容：

def start_requests(self):
    base_url = 'https://catalog.loc.gov/vwebv/search?...'
    variables = [...]
    for variable in variables:
        url = base_url.format(variable)
        yield Request(url)

如何在 Scrapy 中使用多个相同格式的 URL 开始抓取

问题描述

1 个解决方案

解决方案1
1 2017-10-18 18:41:00

如何在 Scrapy 中使用多个相同格式的 URL 开始抓取

问题描述

1 个解决方案

解决方案1 1 2017-10-18 18:41:00

解决方案1
1 2017-10-18 18:41:00