I would like to parse an html code that looks like this:
<div>
<span>Current Status</span>FINAL DECISION </div>
<div>
<span>Applicant</span>GC Planning Partnership Ltd </div>
<div>
<span>Agent</span>GC Planning Partnership Ltd </div>
<div>
<span>Wards</span>Springfield Ward </div>
<div>
<span>Location Co ordinates</span>Easting 534379 Northing 187690 </div>
<div>
<span>Parishes</span> </div>
<div>
<span>OS Mapsheet</span> </div>
<div>
Now, I don't want to get the text that is in-between the <span>
tags, but rather the information right after it. From the example above, i would like to extract the values like "Final Decision", "Springfield Ward" or similar. I am very new to parsing html and i have no clue of how to get there.
I would be very happy for any hint or idea!
Thanks a lot!
If you want the text after a span
element with a specific text - you can find the span
element by text first and then get the .next_sibling
:
soup.find("span", text="Current Status").next_sibling
If, though you want to loop over all the span
elements and for each span
element get the next text sibling:
from bs4 import BeautifulSoup, NavigableString
for span in soup.find_all("span"):
next_text = span.next_sibling
if isinstance(next_text, NavigableString):
print(next_text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.