如何修复一无所有的刮scrap蜘蛛

Question

the following spider creates a blank .xml file when run instead of one containing the items needed, can you spot the mistake(s)? 下面的蜘蛛程序在运行时会创建一个空白的.xml文件，而不是包含所需项目的文件，您能发现错误吗？

Please note, I'm an absolute amateur so using Occam's razor may be the easiest solution. 请注意，我绝对是业余爱好者，因此使用Occam剃刀可能是最简单的解决方案。

Spider code in arakaali.py: arakaali.py中的蜘蛛代码：

import scrapy
from PoExtractor.items import PoextractorItem


class RedditSpider(scrapy.Spider):
    name = "arakaali"
    start_urls = [
        "https://pathofexile.gamepedia.com/Araku_Tiki"
    ]

    def parse(self, response):
            item = PoextractorItem()
            item["item_name"] = selector.xpath("//*[@id='mw-content-text']/span/span[1]/span[1]/text()[1]").extract()
            item["flavor_text"] = selector.xpath("//*[@id='mw-content-text']/span/span[1]/span[2]/span[3])").extract()
            yield item

Code of items.py: items.py代码：

import scrapy


    class PoextractorItem(scrapy.Item):

        flavor_text = scrapy.Field()
        item_name = scrapy.Field()
        pass

Then I use the command scrapy crawl arakaali but the result is a blank document. 然后，我使用命令scrapy crawl arakaali但结果是空白文档。

The page I'm trying to extract data from is https://pathofexile.gamepedia.com/Araku_Tiki 我正在尝试从中提取数据的页面是https://pathofexile.gamepedia.com/Araku_Tiki

Thanks in advance for any help. 在此先感谢您的帮助。

Answer 1

Somehow instead of response you use selector variable which is not defined, but you should get an error when run that code. 不知何故，您使用未定义的selector变量来代替response ，但是在运行该代码时会出现错误。

UPDATE : 更新：

You have an error in second xpath "//*[@id='mw-content-text']/span/span[1]/span[2]/span[3])" and should remove the last bracket in the expression (after span[3] ) 您在第二个xpath中有一个错误"//*[@id='mw-content-text']/span/span[1]/span[2]/span[3])" ，应删除表达式（在span[3] ）

如何修复一无所有的刮scrap蜘蛛

问题描述

1 个解决方案

解决方案1
0 已采纳 2018-08-28 09:13:49

如何修复一无所有的刮scrap蜘蛛

问题描述

1 个解决方案

解决方案1 0 已采纳 2018-08-28 09:13:49

解决方案1
0 已采纳 2018-08-28 09:13:49