Scrape table element by row name using beautifulsoup

Question

Here is the html that I want to scrape :

<dl class="some class">
    <dt> <strong>Text1</strong></dt>
    <dd> Result1</dd>
    <dt> <strong>Text2</strong></dt>
    <dd> Result2</dd>
    <dt> <strong>Text3</strong></dt>
    <dd> Result3</dd>
    <dt> <strong>Text4</strong></dt>
    <dd> Result4</dd>
    .  .  .
</dl>

What I want is to get the Result3 right next to Text3 . In selenium, I would do this by:

parent=driver.find_element_by_css_selector("dl.BuyingOptions-labeledValues")
elem=parent.find_element_by_xpath("//dt[contains(.,'Text3')]/following::dd[1]")

I want to use beautifulsoup for the same thing now. I first tried:

parent=soup.find("dl","BuyingOptions-labeledValues")

which is working fine and print(parent.text) gets all the table text. Then I tried:

elem = parent.find("dt",string='Country Of Origin')

This is not working. Please can someone help. I am new to beautifulsoup

Answer 1

You can use CSS Selector with bs4 4.7.1+ dt:contains("Text3") + dd . This will select <dd> that is places immediately after <dt> that contains text "Text3" :

data = '''
<dl class="some class">
    <dt> <strong>Text1</strong></dt>
    <dd> Result1</dd>
    <dt> <strong>Text2</strong></dt>
    <dd> Result2</dd>
    <dt> <strong>Text3</strong></dt>
    <dd> Result3</dd>
    <dt> <strong>Text4</strong></dt>
    <dd> Result4</dd>
</dl>'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(data, 'html.parser')

print( soup.select_one('dt:contains("Text3") + dd').get_text(strip=True) )

Prints:

Result3

Further reading:

CSS Selectors Reference

Another method (using bs4 filtering):

print( soup.find(lambda t: t.name=='dt' and t.text.strip()=='Text3').find_next_sibling() )

Prints:

<dd> Result3</dd>

Scrape table element by row name using beautifulsoup

Question

1 answers

solution1
0 2019-08-05 05:35:04

Scrape table element by row name using beautifulsoup

Question

1 answers

solution1 0 2019-08-05 05:35:04

solution1
0 2019-08-05 05:35:04