[英]python scrapy xpath:InternalError: (1136, u“Column count doesn't match value count at row 1”)
有我的代碼。當我爬其他URL時,沒問題,但是當我爬這個URL時,它問我列不匹配。我不知道為什么計數長度是字符長度,而不是字典長度?
class JikespiderSpider(scrapy.Spider):
name = "jikespider"
allowed_domains = ["fromgeek.com"]
start_urls = ['http://www.fromgeek.com/topic/']
def parse(self, response):
sel = Selector(response)
jike_list = sel.xpath('//ul[@id="masonry0"]')
ll = len(sel.xpath('//ul[@id="masonry0"]/li'))
for jike in range(ll):
item = JikeItem()
try:
item['jike_title'] = jike_list.xpath('//li/div/div[@class="n-pic fl"]/a/@title').extract()[jike].strip()
item['jike_uptime'] = jike_list.xpath('//li/div/div[@class="n-keytime "]/div[@class="time fr"]/text()').extract()[jike].strip()
item['jike_tag'] = jike_list.xpath('//li/div/div[@class="n-keytime "]/div[@class="key fl"]').xpath('string(.)').extract()[jike].strip()
print len(item['jike_title'])
print len(item['jike_uptime'])
print len(item['jike_tag'])
print '--------------------------'
yield item
except Exception,e:
print e
我無法使用代碼重現您的錯誤消息。 ( scrapy 1.3.2
, Python 2.7.11
)。
我想知道為什么您不循環selector list
而是建立一個計數器來訪問元素。 使用嵌套的XPath查詢要容易得多。
class JikespiderSpider(scrapy.Spider):
name = "jikespider"
allowed_domains = ["fromgeek.com"]
start_urls = ['http://www.fromgeek.com/topic/']
def parse(self, response):
sel_jike_list = response.xpath('//ul[@id="masonry0"]/li')
for sel_jike in sel_jike_list:
item = JikeItem()
item['jike_title'] = sel_jike.xpath('.//div[@class="n-pic fl"]/a/@title').extract_first()
# ... other fields
yield item
請注意嵌套XPath開頭的點。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.