Question on beautiful soup in python
I have a HTML like
<div class="content">Somedata</div>
<div class="content">Somedata</div>
<div class="content">Qualification</div>
<div class="content">THE DATA I WANT</div>
<div class="content">Somedata</div>
<div class="content">Somedata</div>
same div tags repeats again
In this scenario: No I'd or any unique tag's, all containing ---only div tags---
how do I get "THE DATA I WANT" text which is after Qualification thanks in advance
txt = '''
<div class="content">Somedata</div>
<div class="content">Somedata</div>
<div class="content">Qualification</div>
<div class="content">THE DATA I WANT</div>
<div class="content">Somedata</div>
<div class="content">Somedata</div>'''
soup = BeautifulSoup(txt, 'html.parser')
print(soup.select_one('div:contains("Qualification") ~ div').text)
Prints:
THE DATA I WANT
Or:
print(soup.find(text="Qualification").find_next().text)
Or:
print(soup.find(lambda t: t.find_previous() and t.find_previous().text == 'Qualification').text)
EDIT: To iterate over <div>
s you can use simple for-loop:
for item in souped.find_all(lambda t: t.name == 'div' and t.text == 'Qualification'):
print(item.find_next().text)
You can try it:
from bs4 import BeautifulSoup
html_doc ='''<div class="content">Somedata</div>
<div class="content">Somedata</div>
<div class="content">Qualification</div>
<div class="content">THE DATA I WANT</div>
<div class="content">Somedata</div>
<div class="content">Somedata</div>'''
soup = BeautifulSoup(html_doc, 'lxml')
result = soup.find_all("div", class_="content")[3].text
print(result)
Output will be:
THE DATA I WANT
OR
import re
soup = BeautifulSoup(html_doc, 'lxml')
print(soup.find(text=re.compile('^THE DATA I WANT$')))
OR
print(soup.find(string="Qualification").find_next().text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.