[英]How to get result of working scrapy crawler as a variable, python?
[英]How to get depth of each crawler with Scrapy
有沒有辦法跟蹤每個爬蟲的深度?
我正在遞歸地抓取一些網站。
我的設置類似於下面的代碼。
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
next_crawl_depth = response.meta['depth'] + 1
if(condition is satisfied):
with open(filename, "a") as file:
file.write(record depth and url)
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})
這種方法行不通。
例如,我想記錄每個爬蟲的活動
crawler depth1 URL1
crawler depth2 URL2
...
先感謝您。
我想你快到了。 請試試這個代碼。
import scrapy
class Crawl(scrapy.Spider):
name = "Crawl"
def start_requests(self):
if(condition is satisfied):
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth':1})
def parse(self, response):
cur_crawl_depth = response.meta['depth']
next_crawl_depth = cur_crawl_depth + 1
if(condition is satisfied):
with open(filename, "w+") as f:
f.write(url + str(cur_crawl_depth) + "\n")
yield scrapy.Request(url=url,
callback=self.parse,
meta={'depth': next_crawl_depth})
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.