如何使用 Scrapy 獲取每個爬蟲的深度

Question

有沒有辦法跟蹤每個爬蟲的深度？

我正在遞歸地抓取一些網站。

我的設置類似於下面的代碼。

import scrapy

class Crawl(scrapy.Spider):
    name = "Crawl"

    def start_requests(self):
        if(condition is satisfied):
            yield scrapy.Request(url=url, 
                                 callback=self.parse,
                                 meta={'depth':1})

    def parse(self, response):
        next_crawl_depth = response.meta['depth'] + 1
        if(condition is satisfied):
            with open(filename, "a") as file:
                file.write(record depth and url)
            yield scrapy.Request(url=url,
                                 callback=self.parse,
                                 meta={'depth': next_crawl_depth})

這種方法行不通。

例如，我想記錄每個爬蟲的活動

crawler depth1 URL1
crawler depth2 URL2
...

先感謝您。

Answer 1

我想你快到了。 請試試這個代碼。

import scrapy

class Crawl(scrapy.Spider):
name = "Crawl"

def start_requests(self):
    if(condition is satisfied):
        yield scrapy.Request(url=url, 
                             callback=self.parse,
                             meta={'depth':1})

def parse(self, response):
    cur_crawl_depth = response.meta['depth']
    next_crawl_depth = cur_crawl_depth + 1
    if(condition is satisfied):
        with open(filename, "w+") as f:
            f.write(url + str(cur_crawl_depth) + "\n")
        yield scrapy.Request(url=url,
                             callback=self.parse,
                             meta={'depth': next_crawl_depth})

如何使用 Scrapy 獲取每個爬蟲的深度

問題描述

1 個解決方案

解決方案1
1 已采納 2019-12-28 10:58:34

如何使用 Scrapy 獲取每個爬蟲的深度

問題描述

1 個解決方案

解決方案1 1 已采納 2019-12-28 10:58:34

解決方案1
1 已采納 2019-12-28 10:58:34