簡體   English   中英

Scrapy將多余的數據從csv文件傳遞到解析

[英]Scrapy pass extra data from csv file into parse

我的小蜘蛛會瀏覽一個csv文件,並使用該csv文件中的地址運行start_urls,如下所示:

 from csv import DictReader
   with open('addresses.csv') as rows:
     start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]

但是.csv文件還包含電子郵件和其他信息。 如何將這些額外信息傳遞到解析中,以將其添加到新文件中?

import scrapy
from csv import DictReader

with open('addresses.csv') as rows:
  names=[row["Name"].replace(',','') for row in DictReader(rows)]
  emails=[row["Email"].replace(',','') for row in DictReader(rows)]
  start_urls=['http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+') for row in DictReader(rows)]

 def parse(self,response):
   yield{
     'name': FROM CSV,
     'email': FROM CSV,
     'address' FROM SCRAPING: 
     'city' FROM SCRAPING: 
    }
import scrapy
from csv import DictReader

class MySpider(scrapy.Spider):

    def start_requests(self):

        with open('addresses.csv') as rows:

            for row in DictReader(rows):

                name=row["Name"].replace(',','')
                email=row["Email"].replace(',','')

                link = 'http://www.example.com/search/?where='+row["Address"].replace(',','').replace(' ','+')

                yield Request(url = link, 
                        callback = self.parse, 
                        method = "GET", 
                        meta={'name':name, 'email':email}
                    )


    def parse(self,response):
        yield{
         'name': resposne.meta['name'],
         'email': respose.meta['email'],
         'address' FROM SCRAPING: 
         'city' FROM SCRAPING: 
        }
  • 打開您的CSV文件。
  • start_requests方法中對其進行迭代。
  • 將參數傳遞給回調函數,使用meta變量,您可以在meta傳遞Python字典。

注意:請記住, start_requests不是我的自定義定義方法,而是它的Python Scrapy方法。 請參閱https://doc.scrapy.org/en/latest/topics/spiders.html#scrapy.spiders.Spider.start_requests

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM