使用beautifulsoup从html中的标签中提取文本

Question

This is the portion of html code that express the information that I want to extract from a webpage. 这是html代码中表示我想从网页中提取的信息的部分。 My intention is to extract just the names and values between the b tags. 我的目的是只提取b标签之间的名称和值。 The result I expect is a list something like this: [On,DVI,396,2035,2551] 我期望的结果是这样的列表： [On,DVI,396,2035,2551]

 ...   
<div class="txt"><br> 
Power: <b>On</b><br><br>
Source: <b>DVI</b><br><br>
Lamp runtime: <b>396</b> hours<br>
Lamp remaining: <b>2035</b> hours<br>
Total operation: <b>2551</b> hours<br>
</div>
...

What I tried was: 我试过的是：

from bs4 import BeautifulSoup
import urllib2
url='ip address here'
html=urllib2.urlopen(url).read()
soup=BeautifulSoup(html)
main_div=soup.find("div",{"class":"txt"})
data=main_div.findAll('b').text

What did go wrong? 出了什么问题？ FYI, I am a beginner so please bear with me. 仅供参考，我是初学者，请耐心等待。

Answer 1

尝试

data=[b.string for b in main_div.findAll('b')]

Answer 2

Maybe something like this? 也许是这样的？

import BeautifulSoup

html = '''<div class="txt"><br> 
\nPower: <b>On</b><br><br>
\nSource: <b>DVI</b><br><br>
\nLamp runtime: <b>396</b> hours<br>
\nLamp remaining: <b>2035</b> hours<br>
\nTotal operation: <b>2551</b> hours<br>
\n</div>'''

soup = BeautifulSoup.BeautifulSoup(html)
bTags = [] 

for i in soup.findAll('b'):
    bTags.append(i.text)

Contents of bTags: bTags的内容：

[u'On', u'DVI', u'396', u'2035', u'2551']

使用beautifulsoup从html中的标签中提取文本

问题描述

2 个解决方案

解决方案1
2 已采纳 2013-03-21 04:08:50

解决方案2
2 2013-03-21 05:12:23

使用beautifulsoup从html中的标签中提取文本

问题描述

2 个解决方案

解决方案1 2 已采纳 2013-03-21 04:08:50

解决方案2 2 2013-03-21 05:12:23

解决方案1
2 已采纳 2013-03-21 04:08:50

解决方案2
2 2013-03-21 05:12:23