Trying to extract text from a element which looks like this:
<div><strong>"Beginning_of_text"</strong>"Rest_of_text"</div>
When I try to extract "Rest_of_text"
using Scrapy shell with
response.css("div::text").extraxt()
It gives me nothing. Do I have to use some special command to get to text that lies after a <strong>
tag inside an element?
仅对于“ Rest_of_text”,可以使用response.xpath('//div/strong/following-sibling::text()').get()
Given the text you provided, the command you've mentioned should've returned the following:
['"Rest_of_text"']
The problem may occur if there is whitespace before strong
tag, eg:
<div> <strong>"Beginning_of_text"</strong>"Rest_of_text"</div>
In this case, if you execute the same command, you'll get this:
[' ', '"Rest_of_text"']
But in case if there's nothing after the strong
tag, you'll get this:
[' ']
The best way to handle all these cases I know is to do the following:
>>> full_text = ''.join(response.xpath('//div//text()').extract())
>>> before_strong, after_strong = full_text.split(response.css('strong::text').extract_first())
So in the text you've provided, before_strong
will be equal to ''
and after_strong
will be equal to '"Rest_of_text"'
, which seems to be what you want to get.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.