xpath：如何在<strong>元素之前，之后和之后提取文本

Question

我正在研究Scrapy蜘蛛，其中xpath用于提取所需的信息。 源页面首先使用网站的搜索功能生成。 例如，我的兴趣是在标题中获得带有“计算机”的项目。 在源页面上，由于搜索过程，所有“计算机”都以粗体显示 。 并且“计算机”可能位于标题的开头，中间或末尾。 有些项目标题中没有“计算机”。 请参阅以下示例：

Example 1: ("computer" at the beginning)
<a class="title" href="whatever1">
<strong> Computer </strong>
, used
</a>  

Example 2: ("computer" in the middle)
<a class="title" href="whatever2">
Low price
<strong> computer </strong>
, great deal
</a> 

Example 3: ("computer" at the end)
<a class="title" href="whatever3">
Don't miss this
<strong> Computer </strong>
</a>

Example 4: (no keyword of "computer")
<a class="title" href="whatever4">
Best laptop deal ever!      
</a>

我试过的xpath代码.//a[@class="title"]/text()只生成strong元素之后的部分。 对于上面的4个例子，我将得到以下结果：

Example 1:
, used

Example 2:
, great deal

Example 3: (Nothing)


Example 4:
Best laptop deal ever!

我需要一个xpath代码来涵盖所有这四种情况并收集每个项目的完整标题。

Answer 1

最简单的方法是搜索所有“文本”节点并“加入”它们：

"".join(response.xpath('.//a[@class="title"]//text()').extract())

注意text()之前的双斜杠这是这里的关键修复。

xpath：如何在<strong>元素之前，之后和之后提取文本

问题描述

1 个解决方案

解决方案1
4 已采纳 2015-10-11 03:16:16

xpath：如何在<strong>元素之前，之后和之后提取文本

问题描述

1 个解决方案

解决方案1 4 已采纳 2015-10-11 03:16:16

解决方案1
4 已采纳 2015-10-11 03:16:16