运行Spider时如何在日志中写入数据？

Question

While running scrapy spider, I am seeing that the log message has "DEBUG:" which has 1. DEBUG: Crawled (200) (GET http://www.example.com ) (referer: None) 2. DEBUG: Scraped from (200 http://www.example.com ) 运行scrapy spider时，我看到日志消息中包含“ DEBUG：”，其中包含1. DEBUG：已抓取（200）（GET http://www.example.com ）（引用：无）。2. DEBUG：已从（200 http://www.example.com ）

I want to know that 1. what to those "Crawled" and "Scraped from" meant for? 我想知道1.对那些“抓取”和“从中抓取”意味着什么？ 2. From where those above both ULRs returned from(ie while scraping page which variable/argument has holding those URLs) 2.从这两个ULR上方返回的位置（即，在抓取变量/参数包含这些URL的页面时）

Answer 1

Let me try to explain based on the Scrapy Sample Code shown on the Scrapy Website . 让我尝试根据Scrapy网站上显示的Scrapy Sample Code进行解释。 I saved this in a file scrapy_example.py . 我将其保存在scrapy_example.py文件中。

from scrapy import Spider, Item, Field

class Post(Item):
    title = Field()

class BlogSpider(Spider):
    name, start_urls = 'blogspider', ['http://blog.scrapinghub.com']

    def parse(self, response):
        return [Post(title=e.extract()) for e in response.css("h2 a::text")]

Executing this with the command scrapy runspider scrapy_example.py it will produce the following output: 使用命令scrapy runspider scrapy_example.py执行此scrapy runspider scrapy_example.py ，将产生以下输出：

(...)
DEBUG: Crawled (200) <GET http://blog.scrapinghub.com> (referer: None) ['partial']
DEBUG: Scraped from <200 http://blog.scrapinghub.com>
    {'title': u'Using git to manage vacations in a large distributed\xa0team'}
DEBUG: Scraped from <200 http://blog.scrapinghub.com>
    {'title': u'Gender Inequality Across Programming\xa0Languages'}
(...)

Crawled means: scrapy has downloaded that webpage. Crawled意味着：scrapy已下载该网页。

Scraped means: scrapy has extracted some data from that webpage. Scraped是指：scrapy已经从网页中提取一些数据。

The URL is given in the script as start_urls parameter. URL在脚本中作为start_urls参数给出。

Your output must have been generated by running a spider. 您的输出必须是通过运行Spider生成的。 Search the file where that spider is defined and you should be able to spot the place where the url is defined. 搜索定义蜘蛛的文件，您应该能够找到定义URL的位置。

运行Spider时如何在日志中写入数据？

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-06-11 15:46:49

运行Spider时如何在日志中写入数据？

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-06-11 15:46:49

解决方案1
2 已采纳 2015-06-11 15:46:49