简体   繁体   中英

parsing returned HTML by beautifulsoup

I am trying to parse some html here and using beautiful soup

At a point I search for a specific div tag as in:

print soup.find("div", {"class": "sorteringsvalg Alle"})

and the output returned is as follows:

<div class="sorteringsvalg Alle"> Alle  <label class="sorteringtype">
<input checked="" name="type" type="radio" value="Alle"/>(638) </label></div>

What I am interested in is the number in brackets, so I need to further process this data. I've tried using 're' regular expressions on this but the object returned is not represented as a string so it wouldn't work.

Find the inner input and get the next sibling :

div = soup.find("div", {"class": "sorteringsvalg Alle"})
print div.find("input", value="Alle").next_sibling.strip()

Or, in one go with a CSS selector :

soup.select("div.Alle input[value=Alle]")[0].next_sibling.strip()

您可能会在标记内获得字符串,如下所示:

print soup.find("label").get_text(strip=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM