Here is the html that I want to scrape :
<dl class="some class">
<dt> <strong>Text1</strong></dt>
<dd> Result1</dd>
<dt> <strong>Text2</strong></dt>
<dd> Result2</dd>
<dt> <strong>Text3</strong></dt>
<dd> Result3</dd>
<dt> <strong>Text4</strong></dt>
<dd> Result4</dd>
. . .
</dl>
What I want is to get the Result3
right next to Text3
. In selenium, I would do this by:
parent=driver.find_element_by_css_selector("dl.BuyingOptions-labeledValues")
elem=parent.find_element_by_xpath("//dt[contains(.,'Text3')]/following::dd[1]")
I want to use beautifulsoup for the same thing now. I first tried:
parent=soup.find("dl","BuyingOptions-labeledValues")
which is working fine and print(parent.text)
gets all the table text. Then I tried:
elem = parent.find("dt",string='Country Of Origin')
This is not working. Please can someone help. I am new to beautifulsoup
You can use CSS Selector with bs4 4.7.1+ dt:contains("Text3") + dd
. This will select <dd>
that is places immediately after <dt>
that contains text "Text3"
:
data = '''
<dl class="some class">
<dt> <strong>Text1</strong></dt>
<dd> Result1</dd>
<dt> <strong>Text2</strong></dt>
<dd> Result2</dd>
<dt> <strong>Text3</strong></dt>
<dd> Result3</dd>
<dt> <strong>Text4</strong></dt>
<dd> Result4</dd>
</dl>'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(data, 'html.parser')
print( soup.select_one('dt:contains("Text3") + dd').get_text(strip=True) )
Prints:
Result3
Further reading:
Another method (using bs4 filtering):
print( soup.find(lambda t: t.name=='dt' and t.text.strip()=='Text3').find_next_sibling() )
Prints:
<dd> Result3</dd>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.