简体   繁体   English

美丽的汤只需获取标签内的值

[英]beautiful soup just get the value inside the tag

The following command:以下命令:

volume = soup.findAll("span", {"id": "volume"})[0]

gives:给出:

<span class="gr_text1" id="volume">16,103.3</span>

when I issue a print(volume).当我发出打印(卷)时。

How do I get just the number?我怎么只得到号码?

从元素中提取字符串:

volume = soup.findAll("span", {"id": "volume"})[0].string

Using css selector :使用css 选择器

>>> soup.select('span#volume')[0].text
u'16,103.3'

Just to add , I also found the .string dosn't do well when there is <br> in the text.补充一点,我还发现当文本中有<br>.string

EG:例如:

 <div class = "Lines">
    <span> First Line <br> Second Line <br> Third Line </span>
  </div>

If we do a soup.find("div",attrs={"class":"Lines}).span.string we get a None如果我们做一个soup.find("div",attrs={"class":"Lines}).span.string我们得到一个None

But a soup.find("div",attrs={"class":"Lines}).span.text we get但是我们得到了一个soup.find("div",attrs={"class":"Lines}).span.text

 First Line Second Line Third Line

I think the .string gives a NavigatableString object and .text gives a unicode object.我认为.string给出了一个NavigatableString对象,而.text给出了一个 unicode 对象。

There is a function for getting the value of the tag : tag.contents[0]有一个获取标签值的函数:tag.contents[0]

Try this :尝试这个 :

volumes = soup('span')
for volume in volumes:
     print(volume.contents[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM