xpath：如何在<strong>元素之前，之后和之后提取文本

Question

我正在研究Scrapy蜘蛛，其中xpath用於提取所需的信息。 源頁面首先使用網站的搜索功能生成。 例如，我的興趣是在標題中獲得帶有“計算機”的項目。 在源頁面上，由於搜索過程，所有“計算機”都以粗體顯示 。 並且“計算機”可能位於標題的開頭，中間或末尾。 有些項目標題中沒有“計算機”。 請參閱以下示例：

Example 1: ("computer" at the beginning)
<a class="title" href="whatever1">
<strong> Computer </strong>
, used
</a>  

Example 2: ("computer" in the middle)
<a class="title" href="whatever2">
Low price
<strong> computer </strong>
, great deal
</a> 

Example 3: ("computer" at the end)
<a class="title" href="whatever3">
Don't miss this
<strong> Computer </strong>
</a>

Example 4: (no keyword of "computer")
<a class="title" href="whatever4">
Best laptop deal ever!      
</a>

我試過的xpath代碼.//a[@class="title"]/text()只生成strong元素之后的部分。 對於上面的4個例子，我將得到以下結果：

Example 1:
, used

Example 2:
, great deal

Example 3: (Nothing)


Example 4:
Best laptop deal ever!

我需要一個xpath代碼來涵蓋所有這四種情況並收集每個項目的完整標題。

Answer 1

最簡單的方法是搜索所有“文本”節點並“加入”它們：

"".join(response.xpath('.//a[@class="title"]//text()').extract())

注意text()之前的雙斜杠這是這里的關鍵修復。

xpath：如何在<strong>元素之前，之后和之后提取文本

問題描述

1 個解決方案

解決方案1
4 已采納 2015-10-11 03:16:16

xpath：如何在<strong>元素之前，之后和之后提取文本

問題描述

1 個解決方案

解決方案1 4 已采納 2015-10-11 03:16:16

解決方案1
4 已采納 2015-10-11 03:16:16