Python - BS4 - how to grab text not wrapped in an html tag

Question

<p>
Beth Weatherby, Ph.D.
<br/>
Chancellor<br/>
<a href="mailto:beth.weatherby@umwestern.edu">beth.weatherby@umwestern.edu</a>, (406) 683-7151</p>strong text

I need to grab Beth Weatherby out of this HTML and then grab Chancellor and save both to different variables - without using any other libraries but BS4. I was trying next_sibling and things like that but I totally need help.

Answer 1

Since you only know the exact text, try to search for the text using the text=“My text” argument, and than use find_next() :

from bs4 import BeautifulSoup

html = '''<p>
Beth Weatherby, Ph.D.
<br/>
Chancellor<br/>
<a href="mailto:beth.weatherby@umwestern.edu">beth.weatherby@umwestern.edu</a>, (406) 683-7151</p>strong text
'''

soup = BeautifulSoup(html, 'html.parser')


for tag in soup.find_all(text=lambda t: 'Beth Weatherby, Ph.D.' in t):
    print(tag.strip())
    print(tag.find_next(text=True).strip())

Output:

Beth Weatherby, Ph.D.
Chancellor

Python - BS4 - how to grab text not wrapped in an html tag

Question

1 answers

solution1
0 2020-11-13 03:53:32

Python - BS4 - how to grab text not wrapped in an html tag

Question

1 answers

solution1 0 2020-11-13 03:53:32

solution1
0 2020-11-13 03:53:32