[英]Scrapy exports invalid json
My parse looks like this: 我的解析看起来像这样:
def parse(self, response):
hxs = HtmlXPathSelector(response)
titles = hxs.select("//tr/td")
items = []
for titles in titles:
item = MyItem()
item['title'] = titles.select('h3/a/text()').extract()
items.append(item)
return items
Why does it output json like this: 为什么输出这样的json:
[{"title": ["random title #1"]},
{"title": ["random title #2"]}]
titles.select('h3/a/text()').extract()
returns a list, so you get a list. titles.select('h3/a/text()').extract()
返回一个列表,因此你得到一个列表。 Scrapy doesn't make any assumptions about your item's structure. Scrapy不对您的项目结构做任何假设。
The quick fix would be to just get the first result: 快速解决方法是获得第一个结果:
item['title'] = titles.select('h3/a/text()').extract()[0]
A better solution would be to use an item loader and use TakeFirst()
as an output processor: 更好的解决方案是使用项加载器并使用
TakeFirst()
作为输出处理器:
from scrapy.contrib.loader import XPathItemLoader
from scrapy.contrib.loader.processor import TakeFirst, MapCompose
class YourItemLoader(XPathItemLoader):
default_item_class = YourItemClass
default_input_processor = MapCompose(unicode.strip)
default_output_processor = TakeFirst()
# title_in = MapCompose(unicode.strip)
And load the item that way: 并以这种方式加载项目:
def parse(self, response):
hxs = HtmlXPathSelector(response)
for title in hxs.select("//tr/td"):
loader = YourItemLoader(selector=title, response=response)
loader.add_xpath('title', 'h3/a/text()')
yield loader.load_item()
As an alternative simple answer you can write a helper function like this: 作为一个替代的简单答案,你可以编写一个这样的辅助函数:
def extractor(xpathselector, selector):
"""
Helper function that extract info from xpathselector object
using the selector constrains.
"""
val = xpathselector.select(selector).extract()
return val[0] if val else None
and call it like this: 并称之为:
item['title'] = extractor(titles, 'h3/a/text()')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.