beautifulsoup extract by class value text

Question

I want to extract paragraph data based on h2 class value. Below is html code.

<div class="myClass">
<div itemprop="reviewBody" class="review-body">
<h2 class="h3">Test1</h2><p>I want to extract this</p>
<h2 class="h3">Test2</h2><p>Dont want to extract</p>
<h2 class="h3">Test3</h2><p>I want to extract this too</p>
< /div>
< /div>

Output should look like

Test 1    | I want to extract this
Test 3    | I want to extract this too

Below is my code, but it extracts all tests(Test1, test2, test3). How to extract data based on h2 text?

soup = bs(page.text, 'html.parser')
divs = soup.find_all(class_="myClass")

test1= [] 

for item in divs[0].find_all('h2',class_="h3"):
    test1.append(item.text.strip())
print(test1)

Answer 1

If I understand correctly, you'd like to apply an additional condition on the h2 text. You can use text argument of the .find_all() , which could hold a list of texts you want to match, eg:

for h2 in soup.find_all('h2', class_='h3', text=['Test1', 'Test3']):
    print(h2.get_text())

If you want to additionally get to the following paragraph, you could use find_next_sibling() :

for h2 in soup.find_all('h2', class_='h3', text=['Test1', 'Test3']):
    print(h2.find_next_sibling('p').get_text())

beautifulsoup extract by class value text

Question

1 answers

solution1
0 ACCPTED 2019-12-09 12:25:24

beautifulsoup extract by class value text

Question

1 answers

solution1 0 ACCPTED 2019-12-09 12:25:24

solution1
0 ACCPTED 2019-12-09 12:25:24