简体   繁体   English

如何从 csv 导入 scrapy 的 start_urls?

[英]How can the start_urls for scrapy be imported from csv?

I try to crawl several urls from a csv file (all in 1 column).我尝试从 csv 文件中抓取几个网址(全部在 1 列中)。 However, the code does not return anything.但是,代码不返回任何内容。 Thanks, Nicole谢谢,妮可

import scrapy
from scrapy.http import HtmlResponse
from scrapy.http import Request
import csv

scrapurls = ""

def get_urls_from_csv():
    with open("produktlink_test.csv", 'rbU') as csv_file:
        data = csv.reader(csv_file)
        scrapurls = []
        for row in data:
            scrapurls.append(column)
            return scrapurls

class GetlinksgalaxusSpider(scrapy.Spider):
    name = 'getlinksgalaxus'
    allowed_domains = []
    
    # An dieser Stelle definieren wir unsere Zieldomains
    start_urls = scrapurls

    def parse(self, response):

    ....

Previous Answer: How to loop through multiple URLs to scrape from a CSV file in Scrapy?l 上一个答案:如何遍历多个 URL 以从 Scrapy 中的 CSV 文件中抓取?l

Also, it's better to put all of your methods inside the Scrapy spider and explicitly add in start_requests.此外,最好将所有方法放入 Scrapy 蜘蛛并显式添加 start_requests。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM