为scrap_命名start_urls

Question

I am crawling urls from a csv file, and each url has a name. 我正在从一个csv文件中爬网URL，每个URL都有一个名称。 How can I download these urls and save them with their names? 如何下载这些网址并将其保存为名称？

reader = csv.reader(open("source1.csv"))
for Name,Sources1 in reader:
    urls.append(Sources1)

class Spider(scrapy.Spider):
    name = "test"
    start_urls = urls[1:]

    def parse(self, response):
        filename = **Name** + '.pdf' //how can I get the names I read from the csv file?

Answer 1

Perhaps you want to override the start_requests() method instead of using start_urls? 也许您想覆盖start_requests（）方法而不是使用start_urls？

Example: 例：

class MySpider(scrapy.Spider):
    name = 'test'

    def start_requests(self):
        data = read_csv()
        for d in data:
            yield scrapy.Request(d.url, meta={'name': d.name})

The meta dict for request will be repassed to the response, so you can later do: 请求的meta字典将被传递到响应，因此您以后可以执行以下操作：

def parse(self, response):
    name = response.meta.get('name')
    ...

为scrap_命名start_urls

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-11-23 11:40:19

为scrap_命名start_urls

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-11-23 11:40:19

解决方案1
2 已采纳 2015-11-23 11:40:19