[英]Scrapy throwing up Traceback when trying to parse tabulated data
I am running Scrapy.org version 2.7 64 bit on Windows Vista 64 bit. 我在Windows Vista 64位上运行Scrapy.org版本2.7 64位。 I have some Scrapy code that is trying parse data contained within a table at the URL contained within the following code:
我有一些Scrapy代码试图在以下代码中包含的URL处解析表中包含的数据:
from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
from scrapy.cmdline import execute
import re
class MySpider(Spider):
name = "wiki"
allowed_domains = ["whoscored.com"]
start_urls = ["http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney"]
def parse(self, response):
for row in response.selector.xpath('//table[@id="player-fixture"]//tr[td[@class="tournament"]]'):
# Is this row contains goal symbols?
list_of_goals = row.xpath('//span[@title="Goal"')
if list_of_goals:
print remove_tags(list_of_goals).encode('utf-8')
execute(['scrapy','crawl','wiki'])
However, it is throwing up the following error: 但是,它引发以下错误:
Traceback (most recent call last):
File "c:\Python27\lib\site-packages\twisted\internet\base.py", line 1201, in mainLoop
self.runUntilCurrent()
File "c:\Python27\lib\site-packages\twisted\internet\base.py", line 824, in runUntilCurrent
call.func(*call.args, **call.kw)
File "c:\Python27\lib\site-packages\twisted\internet\defer.py", line 383, in callback
self._startRunCallbacks(result)
File "c:\Python27\lib\site-packages\twisted\internet\defer.py", line 491, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "c:\Python27\lib\site-packages\twisted\internet\defer.py", line 578, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "c:\Python27\lib\site-packages\scrapy\spider.py", line 56, in parse
raise NotImplementedError
exceptions.NotImplementedError:
Can anyone tell me what the issue is here? 谁能告诉我这里的问题是什么? I am trying to get a screen print of all items in the table, including the data in the goals and assists column.
我正在尝试对表中的所有项目进行屏幕打印,包括目标和辅助列中的数据。
Thanks 谢谢
Your indentation is wrong: 您的缩进是错误的:
class MySpider(Spider):
name = "wiki"
allowed_domains = ["whoscored.com"]
start_urls = ["http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney"]
def parse(self, response):
for row in response.selector.xpath('//table[@id="player-fixture"]//tr[td[@class="tournament"]]'):
# Is this row contains goal symbols?
list_of_goals = row.xpath('//span[@title="Goal"')
if list_of_goals:
print remove_tags(list_of_goals).encode('utf-8')
Implementing a parse
method is a requirement when you use the Spider class
, this is what the method is like in the source code: 使用
Spider class
时,必须实现parse
方法,这就是该方法在源代码中的样子:
def parse(self, response):
raise NotImplementedError
Your indentation was wrong so parse was not part of the class and therefore you had not implemented the required method. 您的缩进是错误的,因此解析不是该类的一部分,因此您尚未实现所需的方法。
The raise NotImplementedError
is there to ensure you write the required parse
method when inheriting from the Spider
base class. raise NotImplementedError
可以确保在从Spider
基类继承时编写所需的parse
方法。
You now just have to find the correct xpath
;) 现在,您只需要找到正确的
xpath
;)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.