[英]beautiful soup just get the value inside the tag
The following command:以下命令:
volume = soup.findAll("span", {"id": "volume"})[0]
gives:给出:
<span class="gr_text1" id="volume">16,103.3</span>
when I issue a print(volume).当我发出打印(卷)时。
How do I get just the number?我怎么只得到号码?
从元素中提取字符串:
volume = soup.findAll("span", {"id": "volume"})[0].string
Using css selector :使用css 选择器:
>>> soup.select('span#volume')[0].text
u'16,103.3'
Just to add , I also found the .string
dosn't do well when there is <br>
in the text.补充一点,我还发现当文本中有<br>
时.string
。
EG:例如:
<div class = "Lines">
<span> First Line <br> Second Line <br> Third Line </span>
</div>
If we do a soup.find("div",attrs={"class":"Lines}).span.string
we get a None
如果我们做一个soup.find("div",attrs={"class":"Lines}).span.string
我们得到一个None
But a soup.find("div",attrs={"class":"Lines}).span.text
we get但是我们得到了一个soup.find("div",attrs={"class":"Lines}).span.text
First Line Second Line Third Line
I think the .string
gives a NavigatableString
object and .text
gives a unicode object.我认为.string
给出了一个NavigatableString
对象,而.text
给出了一个 unicode 对象。
There is a function for getting the value of the tag : tag.contents[0]有一个获取标签值的函数:tag.contents[0]
Try this :尝试这个 :
volumes = soup('span')
for volume in volumes:
print(volume.contents[0])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.