通过beautifulsoup解析返回的HTML

Question

I am trying to parse some html here and using beautiful soup 我试图在这里解析一些html并使用漂亮的汤

At a point I search for a specific div tag as in: 在这一点上，我搜索特定的div标签，如下所示：

print soup.find("div", {"class": "sorteringsvalg Alle"})

and the output returned is as follows: 并且返回的输出如下：

<div class="sorteringsvalg Alle"> Alle  <label class="sorteringtype">
<input checked="" name="type" type="radio" value="Alle"/>(638) </label></div>

What I am interested in is the number in brackets, so I need to further process this data. 我感兴趣的是方括号中的数字，因此我需要进一步处理此数据。 I've tried using 're' regular expressions on this but the object returned is not represented as a string so it wouldn't work. 我试过对此使用're'正则表达式，但返回的对象未表示为字符串，因此无法正常工作。

Answer 1

Find the inner input and get the next sibling : 找到内部输入并获取下一个同级：

div = soup.find("div", {"class": "sorteringsvalg Alle"})
print div.find("input", value="Alle").next_sibling.strip()

Or, in one go with a CSS selector : 或者，只需使用CSS选择器：

soup.select("div.Alle input[value=Alle]")[0].next_sibling.strip()

Answer 2

您可能会在标记内获得字符串，如下所示：

print soup.find("label").get_text(strip=True)

通过beautifulsoup解析返回的HTML

问题描述

2 个解决方案

解决方案1
0 已采纳 2015-08-10 19:16:39

解决方案2
0 2015-08-11 11:56:29

通过beautifulsoup解析返回的HTML

问题描述

2 个解决方案

解决方案1 0 已采纳 2015-08-10 19:16:39

解决方案2 0 2015-08-11 11:56:29

解决方案1
0 已采纳 2015-08-10 19:16:39

解决方案2
0 2015-08-11 11:56:29