简体   繁体   English

Scrapy:简单项目

[英]Scrapy: Simple Project

I want to start a simply scrapy project. 我想开始一个简单的项目。 It is a python project from visual studio. 这是Visual Studio的python项目。 The VS is running in administration mode. VS以管理模式运行。 Unfortunately, parse(...) is never called, but should.. 不幸的是,从来没有调用parse(...),而是应该..

import scrapy
from scrapy.crawler import CrawlerProcess
import logging

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        for title in response.css('.post-header>h2'):
            yield {'title': title.css('a ::text').extract_first()}

        for next_page in response.css('div.prev-post > a'):
            yield response.follow(next_page, self.parse)
        logging.error("this should be printed")

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(BlogSpider)
process.start()
print("ready")

EDIT: my output: 编辑:我的输出:

2018-09-22 08:23:02 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: scrapybot)
2018-09-22 08:23:02 [scrapy.utils.log] INFO: Versions: lxml 4.2.5.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.0, w3lib 1.19.0, Twisted 18.7.0, Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)], pyOpenSSL 18.0.0 (OpenSSL 1.1.0i  14 Aug 2018), cryptography 2.3.1, Platform Windows-10-10.0.17134-SP0
2018-09-22 08:23:02 [scrapy.crawler] INFO: Overridden settings: {'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'}
2018-09-22 08:23:02 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
ready

As note: Twisted is used from https://www.lfd.uci.edu/~gohlke/pythonlibs/ . 请注意:Twisted用于https://www.lfd.uci.edu/~gohlke/pythonlibs/

this looks entire problem of indentations once i fixed it it started working output 一旦修复它,这看起来就整个缩进问题,它开始工作输出

2018-09-22 11:35:47 [root] ERROR: this should be printed

My code snippet its same 我的代码段相同

import scrapy
from scrapy.crawler import CrawlerProcess
import logging

class BlogSpider(scrapy.Spider):
    name = 'blogspider'
    start_urls = ['https://blog.scrapinghub.com']

    def parse(self, response):
        logging.error("this should be printed")
        for title in response.css('.post-header>h2'):
            yield {'title': title.css('a ::text').extract_first()}
        for next_page in response.css('div.prev-post > a'):
            yield response.follow(next_page, self.parse)
        logging.error("this should be printed")

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(BlogSpider)
process.start()
print("ready")

attaching a pastbin paste https://pastebin.com/pDu8kW27 粘贴Pastbin粘贴https://pastebin.com/pDu8kW27

I installed Anaconda, and then executed conda install -c conda-forge scrapy (got some errors). 我安装了Anaconda,然后执行了conda install -c conda-forge scrapy (出现了一些错误)。

Now everything works fine. 现在一切正常。

Installation guide 安装指南

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM