使用 XPath 表达式从 HTML 跨度元素中提取全文

Question

I have a HTML tree which looks like this:我有一个看起来像这样的 HTML 树：

<div id="RF4FOEQ3OPBEX" data-hook="review" class="a-section review aok-relative"><div 
   <div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content reviewText review-text-content a-expander-partial-collapse-content">
      <span> 
             Text line1. 
             <br>
             Text line2. 
       </span>

I am trying to extract all the text from the span with the following XPath expression:我正在尝试使用以下 XPath 表达式从跨度中提取所有文本：

//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span/text()

However this approach only returns the first text line until the break?但是这种方法只返回第一个文本行直到中断？ The question is: how would I approach this problem in the correct way in order to extract the full text content of the HTML span tag?问题是：为了提取 HTML 跨度标签的全文内容，我将如何以正确的方式解决这个问题？ I would appreciate any help very much and thank you in advance for the support.非常感谢您的帮助，并提前感谢您的支持。

Answer 1

use // and getall method to get all text inside specific element使用//和getall方法获取特定元素内的所有文本

getall returns list, just join it getall返回列表， join它

txt = "".join(response.xpath('//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span//text()').getall())

使用 XPath 表达式从 HTML 跨度元素中提取全文

问题描述

1 个解决方案

解决方案1
0 2021-02-28 15:50:38

使用 XPath 表达式从 HTML 跨度元素中提取全文

问题描述

1 个解决方案

解决方案1 0 2021-02-28 15:50:38

解决方案1
0 2021-02-28 15:50:38