简体   繁体   中英

Parsing with Beautiful Soup

I would like to parse an html code that looks like this:

<div>
<span>Current Status</span>FINAL DECISION </div>
<div>
<span>Applicant</span>GC Planning Partnership Ltd </div>
<div>
<span>Agent</span>GC Planning Partnership Ltd </div>
<div>
<span>Wards</span>Springfield Ward </div>
<div>
<span>Location Co ordinates</span>Easting 534379 Northing 187690 </div>
<div>
<span>Parishes</span> </div>
<div>
<span>OS Mapsheet</span>  </div>
<div>

Now, I don't want to get the text that is in-between the <span> tags, but rather the information right after it. From the example above, i would like to extract the values like "Final Decision", "Springfield Ward" or similar. I am very new to parsing html and i have no clue of how to get there.

I would be very happy for any hint or idea!

Thanks a lot!

If you want the text after a span element with a specific text - you can find the span element by text first and then get the .next_sibling :

soup.find("span", text="Current Status").next_sibling

If, though you want to loop over all the span elements and for each span element get the next text sibling:

from bs4 import BeautifulSoup, NavigableString

for span in soup.find_all("span"):
    next_text = span.next_sibling
    if isinstance(next_text, NavigableString):
        print(next_text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM