简体   繁体   English

尝试解析列表数据时,Scrapy抛出Traceback

[英]Scrapy throwing up Traceback when trying to parse tabulated data

I am running Scrapy.org version 2.7 64 bit on Windows Vista 64 bit. 我在Windows Vista 64位上运行Scrapy.org版本2.7 64位。 I have some Scrapy code that is trying parse data contained within a table at the URL contained within the following code: 我有一些Scrapy代码试图在以下代码中包含的URL处解析表中包含的数据:

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.utils.markup import remove_tags
from scrapy.cmdline import execute
import re


class MySpider(Spider):
    name = "wiki"
    allowed_domains = ["whoscored.com"]
    start_urls = ["http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney"]

def parse(self, response):

    for row in response.selector.xpath('//table[@id="player-fixture"]//tr[td[@class="tournament"]]'):
    # Is this row contains goal symbols?
        list_of_goals = row.xpath('//span[@title="Goal"')
        if list_of_goals:
            print remove_tags(list_of_goals).encode('utf-8')     

execute(['scrapy','crawl','wiki'])

However, it is throwing up the following error: 但是,它引发以下错误:

Traceback (most recent call last):
  File "c:\Python27\lib\site-packages\twisted\internet\base.py", line 1201, in mainLoop
    self.runUntilCurrent()
  File "c:\Python27\lib\site-packages\twisted\internet\base.py", line 824, in runUntilCurrent
    call.func(*call.args, **call.kw)
  File "c:\Python27\lib\site-packages\twisted\internet\defer.py", line 383, in callback
    self._startRunCallbacks(result)
  File "c:\Python27\lib\site-packages\twisted\internet\defer.py", line 491, in _startRunCallbacks
    self._runCallbacks()
--- <exception caught here> ---
  File "c:\Python27\lib\site-packages\twisted\internet\defer.py", line 578, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "c:\Python27\lib\site-packages\scrapy\spider.py", line 56, in parse
    raise NotImplementedError
exceptions.NotImplementedError:

Can anyone tell me what the issue is here? 谁能告诉我这里的问题是什么? I am trying to get a screen print of all items in the table, including the data in the goals and assists column. 我正在尝试对表中的所有项目进行屏幕打印,包括目标和辅助列中的数据。

Thanks 谢谢

Your indentation is wrong: 您的缩进是错误的:

class MySpider(Spider):
    name = "wiki"
    allowed_domains = ["whoscored.com"]
    start_urls = ["http://www.whoscored.com/Players/3859/Fixtures/Wayne-Rooney"]

    def parse(self, response):

        for row in response.selector.xpath('//table[@id="player-fixture"]//tr[td[@class="tournament"]]'):
        # Is this row contains goal symbols?
            list_of_goals = row.xpath('//span[@title="Goal"')
            if list_of_goals:
                print remove_tags(list_of_goals).encode('utf-8')

Implementing a parse method is a requirement when you use the Spider class , this is what the method is like in the source code: 使用Spider class时,必须实现parse方法,这就是该方法在源代码中的样子:

def parse(self, response):
        raise NotImplementedError

Your indentation was wrong so parse was not part of the class and therefore you had not implemented the required method. 您的缩进是错误的,因此解析不是该类的一部分,因此您尚未实现所需的方法。

The raise NotImplementedError is there to ensure you write the required parse method when inheriting from the Spider base class. raise NotImplementedError可以确保在从Spider基类继承时编写所需的parse方法。

You now just have to find the correct xpath ;) 现在,您只需要找到正确的xpath ;)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM