简体   繁体   中英

How to get text and corresponding tag with BeautifulSoup?

I have a text, contains HTML tags something like:

text = <p>Some text</p> <h1>Some text</h1> .... 
soup = BeautifulSoup(text)

I parsed this text using BeautifulSoup . I would like to extract every sentence with corresponding text and tag. I tried:

for sent in soup:
    print(sent.text) <- ok
    print(sent.tag) <- **not ok since NavigableString does not has tag attribute**

I also tried soup.find_all() and stuck at the same point: I have access to text but not original tag.

Instead of tag use name to get the elements tag name:

for tag in soup.find_all():
    print(tag.text, tag.name)

Use the parameter 'html.parser' to avoid behavior of standard parser lxml that will slightly reshape the structure and wraps partial HTML in <html> and <body>

Example

from bs4 import BeautifulSoup

html = '''<p>Some text</p><h1>Some text</h1>'''
soup = BeautifulSoup(html, 'html.parser')

for tag in soup.find_all():
    print(tag.text, tag.name)

Output

Some text p
Some text h1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM