简体   繁体   中英

Python - BS4 - how to grab text not wrapped in an html tag

<p>
Beth Weatherby, Ph.D.
<br/>
Chancellor<br/>
<a href="mailto:beth.weatherby@umwestern.edu">beth.weatherby@umwestern.edu</a>, (406) 683-7151</p>strong text
I need to grab Beth Weatherby out of this HTML and then grab Chancellor and save both to different variables - without using any other libraries but BS4. I was trying next_sibling and things like that but I totally need help.

Since you only know the exact text, try to search for the text using the text=“My text” argument, and than use find_next() :

from bs4 import BeautifulSoup

html = '''<p>
Beth Weatherby, Ph.D.
<br/>
Chancellor<br/>
<a href="mailto:beth.weatherby@umwestern.edu">beth.weatherby@umwestern.edu</a>, (406) 683-7151</p>strong text
'''

soup = BeautifulSoup(html, 'html.parser')


for tag in soup.find_all(text=lambda t: 'Beth Weatherby, Ph.D.' in t):
    print(tag.strip())
    print(tag.find_next(text=True).strip())

Output:

Beth Weatherby, Ph.D.
Chancellor

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM