从多个URL读取代码，然后在最终的csv中写入结果

Question

I need to extract data from the script tag of multiple urls with Regex. 我需要使用Regex从多个URL的脚本标签中提取数据。 I've managed to implement a code that does half of the job. 我设法实现了完成一半工作的代码。 I have a csv file( 'links.csv ' ) that contains all the urls I'll need to scrape. 我有一个csv文件（ 'links.csv ' ），其中包含我需要抓取的所有网址。 I managed to read the csv and store all the urls in the variable named 'start_urls' . 我设法读取了csv并将所有URL存储在名为'start_urls'的变量中。 My problem is that I need a way to read the urls from 'start_urls' one at a time and execute the next part of my code. 我的问题是我需要一种方法可以一次从'start_urls'读取URL，然后执行我的代码的下一部分。 When I execute my code in the terminal I receive 2 errors: 当我在终端中执行代码时，我收到两个错误：

1.ERROR: Error while obtaining start requests 2. TypeError: Request url must be str or unicode, got list 1.ERROR：获取启动请求时出错2. TypeError：请求url必须为str或unicode，获取列表

How can I fix my code? 如何修复我的代码？ I am a beginner in Scrapy, but I really need this script to work... Thank you in advance! 我是Scrapy的初学者，但是我真的需要这个脚本才能工作……谢谢！

Here are some examples of urls I stored in the initial csv('links.csv'): 以下是我存储在初始csv（'links.csv'）中的url的一些示例：

"https://www.samsung.com/uk/smartphones/galaxy-note8/"
"https://www.samsung.com/uk/smartphones/galaxy-s8/"
"https://www.samsung.com/uk/smartphones/galaxy-s9/"

Here is my code: 这是我的代码：

import scrapy
import csv
import re

class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        with open('links.csv','r') as csvf:
            for url in csvf:
                yield scrapy.Request(url.strip())

    def parse(self, response):
        source = response.xpath("//script[contains(., 'COUNTRY_SHOP_STATUS')]/text()").extract()[0]
        def get_values(parameter, script):
            return re.findall('%s = "(.*)"' % parameter, script)[0]

        with open('baza.csv', 'w') as csvfile:
            fieldnames = ['Category', 'Type', 'SK']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            for pvi_subtype_name,pathIndicator.depth_5,model_name in zip(source):
                writer.writerow({'Category': get_values("pvi_subtype_name", source), 'Type': get_values("pathIndicator.depth_5", source), 'SK': get_values("model_name", source)})

Answer 1

Append the following method to spider: 将以下方法附加到蜘蛛：

def start_requests(self):
    with open('links.csv','r') as csvf:
        for url in csvf:
            yield scrapy.Request(url.strip())

And remove previous with... block from code. 并从代码中删除以前的with...块。

从多个URL读取代码，然后在最终的csv中写入结果

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-08-28 19:44:15

从多个URL读取代码，然后在最终的csv中写入结果

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-08-28 19:44:15

解决方案1
0 已采纳 2018-08-28 19:44:15