[英]How to fix a scrapy spider that yields nothing
the following spider creates a blank .xml file when run instead of one containing the items needed, can you spot the mistake(s)? 下面的蜘蛛程序在运行时会创建一个空白的.xml文件,而不是包含所需项目的文件,您能发现错误吗?
Please note, I'm an absolute amateur so using Occam's razor may be the easiest solution. 请注意,我绝对是业余爱好者,因此使用Occam剃刀可能是最简单的解决方案。
Spider code in arakaali.py: arakaali.py中的蜘蛛代码:
import scrapy
from PoExtractor.items import PoextractorItem
class RedditSpider(scrapy.Spider):
name = "arakaali"
start_urls = [
"https://pathofexile.gamepedia.com/Araku_Tiki"
]
def parse(self, response):
item = PoextractorItem()
item["item_name"] = selector.xpath("//*[@id='mw-content-text']/span/span[1]/span[1]/text()[1]").extract()
item["flavor_text"] = selector.xpath("//*[@id='mw-content-text']/span/span[1]/span[2]/span[3])").extract()
yield item
Code of items.py: items.py代码:
import scrapy
class PoextractorItem(scrapy.Item):
flavor_text = scrapy.Field()
item_name = scrapy.Field()
pass
Then I use the command scrapy crawl arakaali
but the result is a blank document. 然后,我使用命令
scrapy crawl arakaali
但结果是空白文档。
The page I'm trying to extract data from is https://pathofexile.gamepedia.com/Araku_Tiki
我正在尝试从中提取数据的页面是
https://pathofexile.gamepedia.com/Araku_Tiki
Thanks in advance for any help. 在此先感谢您的帮助。
Somehow instead of response
you use selector
variable which is not defined, but you should get an error when run that code. 不知何故,您使用未定义的
selector
变量来代替response
,但是在运行该代码时会出现错误。
UPDATE : 更新 :
You have an error in second xpath "//*[@id='mw-content-text']/span/span[1]/span[2]/span[3])"
and should remove the last bracket in the expression (after span[3]
) 您在第二个xpath中有一个错误
"//*[@id='mw-content-text']/span/span[1]/span[2]/span[3])"
,应删除表达式(在span[3]
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.