![](/img/trans.png)
[英]Xpath works perfectly in browser console but returns NULL in Python Scrapy
[英]Scrapy spider returns different values compared to browser console xpath result
xpath:
//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()
HTML:
<ol class="breadcrumb container">
<li class="first"><a href="http://example.com/index.php?route=common/home"><span>Home</span></a></li>
<li><a href="http://example.com/books"><span>Books</span></a></li>
<li class="last"><a href="http://example.com/books?product_id=193" class="last"><span>My Vision : Challenges in the Race for Excellence - Mohammed Bin Rashid Al Maktoum</span></a></li>
</ol>
Python代碼:
categories = ['NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA']
catIndex = 0
for cat in sel.xpath('//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()').extract():
categories[catIndex] = cat
catIndex += 1
想要的結果是“ Books”,當我在Firebug控制台上使用xpath對其進行檢查時,它會返回正確的結果,但是當我運行Spider時,它會返回整個3 Li元素,但不排除class =“ first”和class =“ last”
我嘗試了命令Scrapy View http://example.com來查看頁面Spider如何看待它,但是一切看起來都一樣,並且xpath返回正確的結果
當我嘗試在Scrapy Shell中使用xpath時,它返回所有3個Li元素的錯誤結果
可能是什么問題?
在Internet Explorer中打開了Scrapy View http://example.com輸出,發現Li元素中沒有Class屬性。
可以看出,在Chrome或Firefox中打開的Scrapy View命令沒有顯示蜘蛛看到的REALL代碼。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.