簡體   English   中英

如何使用 Scrapy 獲取每個爬蟲的深度

[英]How to get depth of each crawler with Scrapy

有沒有辦法跟蹤每個爬蟲的深度?

我正在遞歸地抓取一些網站。

我的設置類似於下面的代碼。

import scrapy

class Crawl(scrapy.Spider):
    name = "Crawl"

    def start_requests(self):
        if(condition is satisfied):
            yield scrapy.Request(url=url, 
                                 callback=self.parse,
                                 meta={'depth':1})

    def parse(self, response):
        next_crawl_depth = response.meta['depth'] + 1
        if(condition is satisfied):
            with open(filename, "a") as file:
                file.write(record depth and url)
            yield scrapy.Request(url=url,
                                 callback=self.parse,
                                 meta={'depth': next_crawl_depth})

這種方法行不通。

例如,我想記錄每個爬蟲的活動

crawler depth1 URL1
crawler depth2 URL2
...

先感謝您。

我想你快到了。 請試試這個代碼。

import scrapy

class Crawl(scrapy.Spider):
name = "Crawl"

def start_requests(self):
    if(condition is satisfied):
        yield scrapy.Request(url=url, 
                             callback=self.parse,
                             meta={'depth':1})

def parse(self, response):
    cur_crawl_depth = response.meta['depth']
    next_crawl_depth = cur_crawl_depth + 1
    if(condition is satisfied):
        with open(filename, "w+") as f:
            f.write(url + str(cur_crawl_depth) + "\n")
        yield scrapy.Request(url=url,
                             callback=self.parse,
                             meta={'depth': next_crawl_depth})

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM