简体   繁体   English

使用 XPath 表达式从 HTML 跨度元素中提取全文

[英]Extracting full text from HTML span element with XPath expression

I have a HTML tree which looks like this:我有一个看起来像这样的 HTML 树:

<div id="RF4FOEQ3OPBEX" data-hook="review" class="a-section review aok-relative"><div 
   <div data-hook="review-collapsed" aria-expanded="false" class="a-expander-content reviewText review-text-content a-expander-partial-collapse-content">
      <span> 
             Text line1. 
             <br>
             Text line2. 
       </span>

I am trying to extract all the text from the span with the following XPath expression:我正在尝试使用以下 XPath 表达式从跨度中提取所有文本:

//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span/text()

However this approach only returns the first text line until the break?但是这种方法只返回第一个文本行直到中断? The question is: how would I approach this problem in the correct way in order to extract the full text content of the HTML span tag?问题是:为了提取 HTML 跨度标签的全文内容,我将如何以正确的方式解决这个问题? I would appreciate any help very much and thank you in advance for the support.非常感谢您的帮助,并提前感谢您的支持。

use // and getall method to get all text inside specific element使用//getall方法获取特定元素内的所有文本

getall returns list, just join it getall返回列表, join

txt = "".join(response.xpath('//div[@data-hook="review"]//div[@data-hook="review-collapsed"]/span//text()').getall())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM