从脚本运行scrapy时出错

Question

I'm trying to run a scrapy spider from script instead of running it from the command terminal like this: 我正在尝试从脚本运行一个抓痒的蜘蛛，而不是从命令终端运行它，如下所示：

scrapy crawl spidername

In the scrapy documentation I found the following example: https://doc.scrapy.org/en/latest/topics/practices.html . 在草率的文档中，我找到了以下示例： https ://doc.scrapy.org/en/latest/topics/practices.html。

Now, my code looks like this: 现在，我的代码如下所示：

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.loader import ItemLoader
from properties.items import PropertiesItem


class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished

When I run this script I get the following error: 当我运行此脚本时，出现以下错误：

File "/Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-intel.egg/twisted/internet/_sslverify.py", line 38, in TLSVersion.TLSv1_1: SSL.OP_NO_TLSv1_1, AttributeError: 'module' object has no attribute 'OP_NO_TLSv1_1' TLSVersion.TLSv1_1：SSL中的文件“ /Library/Python/2.7/site-packages/Twisted-16.7.0rc1-py2.7-macosx-10.11-intel.egg/twisted/internet/_sslverify.py”，第38行。 OP_NO_TLSv1_1，AttributeError：“模块”对象没有属性“ OP_NO_TLSv1_1”

So my questions are: 所以我的问题是：

1) What kind of error is this? 1）这是什么错误？ I haven't been able to find any examples online. 我还没有在线找到任何示例。

2) What can I change to make scrapy run from this script? 2）我可以通过此脚本进行哪些更改以使scrapy运行？

Updated: 更新：

Added packages installed for project 添加了为项目安装的软件包

attrs==16.3.0 
Automat==0.3.0 
cffi==1.9.1 
characteristic==14.3.0 
constantly==15.1.0 
cryptography==1.7.1 
cssselect==1.0.0 
enum34==1.1.6 
idna==2.2 
incremental==16.10.1 
ipaddress==1.0.17 
lxml==3.7.1 
parsel==1.1.0 
pyasn1==0.1.9 pyasn1-
modules==0.0.8 
pycparser==2.17 
PyDispatcher==2.0.5 
pyOpenSSL==0.15.1 
queuelib==1.4.2 
Scrapy==1.3.0 service-
identity==16.0.0 
six==1.10.0 
tree==0.1.0 
Twisted==16.6.0 
virtualenv==15.1.0 
w3lib==1.16.0 zope.
interface==4.3.3

Answer 1

I found a solution: 我找到了解决方案：

Created a new virtual environment that's based on python 3.6 instead of python 2.7. 创建了一个基于python 3.6而不是python 2.7的新虚拟环境。 I ran the exact same code (had to replace urlparse with urllib.parse though) and it worked! 我运行了完全相同的代码（虽然必须用urllib.parse替换urlparse），但是它起作用了！

Answer 2

1) I am not sure 1）我不确定

2) But your indentation is needs a review: 2）但是您的缩进是需要审查的：

import scrapy
from scrapy.crawler import CrawlerProcess
from scrapy.loader import ItemLoader
from properties.items import PropertiesItem


class MySpider(scrapy.Spider):
    name = "basic"
    allowed_domains = ["web"]
    start_urls = ['http://www.example.com']

    def parse(self, response):
        l = ItemLoader(item=PropertiesItem(), response = response)
        l.add_xpath('title', '//h1[1]/text()')

        return l.load_item()

    process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

    process.crawl(MySpider)
    process.start() # the script will block here until the crawling is finished

and I am assuming various other things a examples in the code. 我在代码中假设了其他各种示例。 ie to run the below spider you will need to input 即运行下面的蜘蛛，您将需要输入

scrapy crawl basic

and that you have a folder called "properties" with the file "items" in it etc 并且您有一个名为“ properties”的文件夹，其中包含文件“ items”等

从脚本运行scrapy时出错

问题描述

2 个解决方案

解决方案1
0 已采纳 2017-01-02 17:54:34

解决方案2
-1 2017-01-02 02:05:01

从脚本运行scrapy时出错

问题描述

2 个解决方案

解决方案1 0 已采纳 2017-01-02 17:54:34

解决方案2 -1 2017-01-02 02:05:01

解决方案1
0 已采纳 2017-01-02 17:54:34

解决方案2
-1 2017-01-02 02:05:01