Parsing with Beautiful Soup

Question

I would like to parse an html code that looks like this:

<div>
<span>Current Status</span>FINAL DECISION </div>
<div>
<span>Applicant</span>GC Planning Partnership Ltd </div>
<div>
<span>Agent</span>GC Planning Partnership Ltd </div>
<div>
<span>Wards</span>Springfield Ward </div>
<div>
<span>Location Co ordinates</span>Easting 534379 Northing 187690 </div>
<div>
<span>Parishes</span> </div>
<div>
<span>OS Mapsheet</span>  </div>
<div>

Now, I don't want to get the text that is in-between the <span> tags, but rather the information right after it. From the example above, i would like to extract the values like "Final Decision", "Springfield Ward" or similar. I am very new to parsing html and i have no clue of how to get there.

I would be very happy for any hint or idea!

Thanks a lot!

Answer 1

If you want the text after a span element with a specific text - you can find the span element by text first and then get the .next_sibling :

soup.find("span", text="Current Status").next_sibling

If, though you want to loop over all the span elements and for each span element get the next text sibling:

from bs4 import BeautifulSoup, NavigableString

for span in soup.find_all("span"):
    next_text = span.next_sibling
    if isinstance(next_text, NavigableString):
        print(next_text)

Parsing with Beautiful Soup

Question

1 answers

solution1
2 ACCPTED 2017-05-03 17:14:41

Parsing with Beautiful Soup

Question

1 answers

solution1 2 ACCPTED 2017-05-03 17:14:41

solution1
2 ACCPTED 2017-05-03 17:14:41