Index error with Javascript parser

Question

I am using Scrapy and the Javascript parsing module 'slimit' to look for a particular Javascript item within pages that I am crawling, like so:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.item import Item
from scrapy.spider import BaseSpider
from slimit import ast
from slimit.parser import Parser
from slimit.visitors import nodevisitor


def get_fields(data):
    parser = Parser()
    tree = parser.parse(data)
    return {getattr(node.left, 'value', ''): getattr(node.right, 'value', '')
            for node in nodevisitor.visit(tree)
            if isinstance(node, ast.Assign)}


class ExampleSpider(CrawlSpider):
    name = "goal2"
    allowed_domains = ["whoscored.com"]
    start_urls = ["http://www.whoscored.com/"]


    rules = [Rule(SgmlLinkExtractor(allow=(''),deny=('')]

    def parse_item(self, response):

        script = sel.xpath('//div[@id="team-stage-stats"]/following-sibling::script/text()')
        if script is not None:
            script = script.extract()[0]

This works fine as long as the item is found on a page crawled. If it isn't I get an error that the list index is out of range. I thought the 'is not None:' statement would sort this, but it appears that this is not the case.

Can anyone see what I am doing wrong?

Thanks

Answer 1

It's likely that your xpath call is returning an empty list instead of None . Changing your check to

if script is not None and len(script) > 0:

should fix the issue. Or more simply, you could rely on the truthiness with just

if script:

Since both None and [] are falsy values. This does the same thing as its longer counterpart.

Index error with Javascript parser

Question

1 answers

solution1
1 ACCPTED 2014-09-21 22:29:33

Index error with Javascript parser

Question

1 answers

solution1 1 ACCPTED 2014-09-21 22:29:33

solution1
1 ACCPTED 2014-09-21 22:29:33