如何提取<strong>元素中标签</strong>后面的文本

Question

Trying to extract text from a element which looks like this: 尝试从元素中提取文本，如下所示：

<div><strong>"Beginning_of_text"</strong>"Rest_of_text"</div>

When I try to extract "Rest_of_text" using Scrapy shell with 当我尝试使用Scrapy shell提取"Rest_of_text" ，

response.css("div::text").extraxt()

It gives me nothing. 它什么也没给我。 Do I have to use some special command to get to text that lies after a <strong> tag inside an element? 我是否必须使用一些特殊命令来获取位于元素内<strong>标记之后的文本？

Answer 1

仅对于“ Rest_of_text”，可以使用response.xpath('//div/strong/following-sibling::text()').get()

Answer 2

Given the text you provided, the command you've mentioned should've returned the following: 给定您提供的文本，您提到的命令应该返回以下内容：

['"Rest_of_text"']

The problem may occur if there is whitespace before strong tag, eg: 如果在strong标签之前有空格，则可能会出现此问题，例如：

<div>   <strong>"Beginning_of_text"</strong>"Rest_of_text"</div>

In this case, if you execute the same command, you'll get this: 在这种情况下，如果执行相同的命令，则会得到以下信息：

['   ', '"Rest_of_text"']

But in case if there's nothing after the strong tag, you'll get this: 但是，如果在strong标签之后没有任何内容，您将得到以下信息：

['   ']

The best way to handle all these cases I know is to do the following: 处理我所知道的所有这些情况的最佳方法是执行以下操作：

>>> full_text = ''.join(response.xpath('//div//text()').extract())
>>> before_strong, after_strong = full_text.split(response.css('strong::text').extract_first())

So in the text you've provided, before_strong will be equal to '' and after_strong will be equal to '"Rest_of_text"' , which seems to be what you want to get. 因此，在您提供的文本中， before_strong将等于'' ， after_strong将等于'"Rest_of_text"' ，这似乎就是您想要的。

如何提取<strong>元素中标签</strong>后面的文本

问题描述

2 个解决方案

解决方案1
2 2018-11-07 12:41:21

解决方案2
0 已采纳 2018-11-06 12:03:09

如何提取<strong>元素中标签</strong>后面的文本

问题描述

2 个解决方案

解决方案1 2 2018-11-07 12:41:21

解决方案2 0 已采纳 2018-11-06 12:03:09

解决方案1
2 2018-11-07 12:41:21

解决方案2
0 已采纳 2018-11-06 12:03:09