與瀏覽器控制台xpath結果相比，Scrapy Spider返回不同的值

Question

xpath：

//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()

HTML：

<ol class="breadcrumb container">
    <li class="first"><a href="http://example.com/index.php?route=common/home"><span>Home</span></a></li>
    <li><a href="http://example.com/books"><span>Books</span></a></li>
    <li class="last"><a href="http://example.com/books?product_id=193" class="last"><span>My Vision : Challenges in the Race for Excellence - Mohammed Bin Rashid Al Maktoum</span></a></li>
</ol>

Python代碼：

categories = ['NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA', 'NO DATA']
catIndex = 0
for cat in sel.xpath('//ol[@class="breadcrumb container"]/li[not(contains(@class,"first")) and not(contains(@class,"last"))]/a/span/text()').extract():
            categories[catIndex] = cat
            catIndex += 1

想要的結果是“ Books”，當我在Firebug控制台上使用xpath對其進行檢查時，它會返回正確的結果，但是當我運行Spider時，它會返回整個3 Li元素，但不排除class =“ first”和class =“ last”

我嘗試了命令Scrapy View http://example.com來查看頁面Spider如何看待它，但是一切看起來都一樣，並且xpath返回正確的結果

當我嘗試在Scrapy Shell中使用xpath時，它返回所有3個Li元素的錯誤結果

可能是什么問題？

Answer 1

在Internet Explorer中打開了Scrapy View http://example.com輸出，發現Li元素中沒有Class屬性。

可以看出，在Chrome或Firefox中打開的Scrapy View命令沒有顯示蜘蛛看到的REALL代碼。

與瀏覽器控制台xpath結果相比，Scrapy Spider返回不同的值

問題描述

1 個解決方案

解決方案1
0 2015-08-29 11:59:41

與瀏覽器控制台xpath結果相比，Scrapy Spider返回不同的值

問題描述

1 個解決方案

解決方案1 0 2015-08-29 11:59:41

解決方案1
0 2015-08-29 11:59:41