[英]scrapy response.xpath only picking out the first item
我有html结构
<div class="column first">
<div class="detail">
<strong>Phone: </strong>
<span class="value"> 012-345-6789</span>
</div>
<div class="detail">
<span class="value">1 Street Address, Big Road, City, Country</span>
</div>
<div class="detail">
<h3 class="inline">Area:</h3>
<span class="value">Georgetown</span>
</div>
<div class="detail">
<h3 class="inline">Nearest Train:</h3>
<span class="value">Georgetown Station</span>
</div>
<div class="detail">
<h3 class="inline">Website:</h3>
<span class="value"><a href='http://www.website.com' target='_blank'>www.website.com</a></span>
</div>
</div>
当我在scrapy shell中运行sel = response.xpath('//span[@class="value"]/text()')
,我得到的期望值是:
[<Selector xpath='//span[@class="value"]/text()' data=u' 012-345-6789'>, <Selector xpath='//span[@class="value"]/text()' data=u'1 Street Address, Big Road, City, Country'>, <Selector xpath='//span[@class="value"]/text()' data=u'Georgetown Station'>, <Selector xpath='//span[@class="value"]/text()' data=u' '>, <Selector xpath='//span[@class="value"]/text()' data=u'January, 2016'>]
但是,在我的小蜘蛛的解析块中,它仅返回第一项
def parse(self, response):
def extract_with_xpath(query):
return response.xpath(query).extract_first().strip()
yield {
'details': extract_with_xpath('//span[@class="value"]/text()')
}
我知道我使用的是extract_first()
但是即使我知道extract()
是合法的函数,但是如果我使用extract()
它也会中断。
我做错了什么? 我是否需要遍历extract_with_xpath('//span[@class="value"]/text()')
部分?
多谢赐教!
在items.py中,指定-
from scrapy.item import Item, Field
class yourProjectNameItem(Item):
# define the fields for your item here like:
name = Field()
details= Field()
在您的蜘蛛网中:进口:
from scrapy.selector import HtmlXPathSelector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.contrib.spiders import CrawlSpider, Rule
from yourProjectName.items import yourProjectNameItem
import re
解析功能如下:
def parse_item(self, response):
hxs = HtmlXPathSelector(response)
i = yourProjectNameItem()
i['name'] = hxs.select('YourXPathHere').extract()
i['details'] = hxs.select('YourXPathHere').extract()
return i
希望这能解决问题。 您可以在git上参考我的项目: https : //github.com/omkar-dsd/SRMSE/tree/master/Scrapers/NasaScraper
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.