從多個URL讀取代碼，然后在最終的csv中寫入結果

Question

我需要使用Regex從多個URL的腳本標簽中提取數據。 我設法實現了完成一半工作的代碼。 我有一個csv文件（ 'links.csv ' ），其中包含我需要抓取的所有網址。 我設法讀取了csv並將所有URL存儲在名為'start_urls'的變量中。 我的問題是我需要一種方法可以一次從'start_urls'讀取URL，然后執行我的代碼的下一部分。 當我在終端中執行代碼時，我收到兩個錯誤：

1.ERROR：獲取啟動請求時出錯2. TypeError：請求url必須為str或unicode，獲取列表

如何修復我的代碼？ 我是Scrapy的初學者，但是我真的需要這個腳本才能工作……謝謝！

以下是我存儲在初始csv（'links.csv'）中的url的一些示例：

"https://www.samsung.com/uk/smartphones/galaxy-note8/"
"https://www.samsung.com/uk/smartphones/galaxy-s8/"
"https://www.samsung.com/uk/smartphones/galaxy-s9/"

這是我的代碼：

import scrapy
import csv
import re

class QuotesSpider(scrapy.Spider):
    name = "quotes"

    def start_requests(self):
        with open('links.csv','r') as csvf:
            for url in csvf:
                yield scrapy.Request(url.strip())

    def parse(self, response):
        source = response.xpath("//script[contains(., 'COUNTRY_SHOP_STATUS')]/text()").extract()[0]
        def get_values(parameter, script):
            return re.findall('%s = "(.*)"' % parameter, script)[0]

        with open('baza.csv', 'w') as csvfile:
            fieldnames = ['Category', 'Type', 'SK']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            for pvi_subtype_name,pathIndicator.depth_5,model_name in zip(source):
                writer.writerow({'Category': get_values("pvi_subtype_name", source), 'Type': get_values("pathIndicator.depth_5", source), 'SK': get_values("model_name", source)})

Answer 1

將以下方法附加到蜘蛛：

def start_requests(self):
    with open('links.csv','r') as csvf:
        for url in csvf:
            yield scrapy.Request(url.strip())

並從代碼中刪除以前的with...塊。

從多個URL讀取代碼，然后在最終的csv中寫入結果

問題描述

1 個解決方案

解決方案1
0 已采納 2018-08-28 19:44:15

從多個URL讀取代碼，然后在最終的csv中寫入結果

問題描述

1 個解決方案

解決方案1 0 已采納 2018-08-28 19:44:15

解決方案1
0 已采納 2018-08-28 19:44:15