從數據庫 scrapy 檢索數據

Question

在 scrapy 中，我試圖從數據庫中檢索數據，這些數據被蜘蛛抓取並添加到 pipelines.py 中的數據庫中。 我希望這些數據在另一個蜘蛛中使用。 具體來說，我想從數據庫中檢索鏈接並在 start_request function中使用它。我知道這個問題在這里也有解釋不知道為什么，但我知道我在某個地方犯了錯誤。

piplines.py
import sqlite3

class HeurekaScraperPipeline:

    def __init__(self):
        self.create_connection()
        self.create_table()

    def create_connection(self):
        self.conn = sqlite3.connect('shops.db')
        self.curr = self.conn.cursor()

    def create_table(self):
        self.curr.execute("""DROP TABLE IF EXISTS shops_tb""")
        self.curr.execute("""create table shops_tb(
                        product_name text, 
                        shop_name text, 
                        price text, 
                        link text
                        )""")

    def process_item(self, item, spider):
        self.store_db(item)
        return item

    def store_db(self, item):
        self.curr.execute("""insert into shops_tb values (?, ?, ?, ?)""",(
            item['product_name'],
            item['shop_name'],
            item['price'],
            item['link'],
        ))

        self.conn.commit()

spider
class Shops_spider(scrapy.Spider):
    name = 'shops_scraper'
    custom_settings = {'DOWNLOAD_DELAY': 1}
    def start_requests(self):
        db_cursor = HeurekaScraperPipeline().curr
        db_cursor.execute("SELECT * FROM shops_tb")

        links = db_cursor.fetchall()
        for link in links:
            url = link[3]
            print(url)
            yield scrapy.Request(url=url, callback=self.parse)
    def parse(self, response):
        url = response.request.url
        print('********************************'+url+'************************')

提前感謝您的幫助。

Answer 1

管道用於處理項目。 如果您想從數據庫中讀取某些內容，請打開連接並在start_request中讀取它。 根據文檔：

一個項目被蜘蛛抓取后，它被發送到項目管道，該管道通過幾個按順序執行的組件來處理它。

為什么不在 start_request 中打開 DB 連接？

def start_requests(self):
        self.conn = sqlite3.connect('shops.db')
        self.curr = self.conn.cursor()
        self.curr.execute("SELECT * FROM shops_tb")
        links = self.curr.fetchall()
        # rest of the code

從數據庫 scrapy 檢索數據

問題描述

1 個解決方案

解決方案1
0 已采納 2020-06-10 10:53:49

從數據庫 scrapy 檢索數據

問題描述

1 個解決方案

解決方案1 0 已采納 2020-06-10 10:53:49

解決方案1
0 已采納 2020-06-10 10:53:49