[英]HTML parsing with Beautiful soup
This is what my HTML looks like : 这就是我的HTML外观:
<table cellspacing="0" cellpadding="0" class="list04" style="width:704px;">
<td class="txt"><img src="img/1001.gif" /></td>
<td>
<div>string1</div>
<div>
string2</div>
</td>
<td><div class="name">string3</div>
</td>
<td>
</td>
<td></td>
</tr>
<tr>
<td></td>
<td class="txt"><img src="img/1002.gif" /></td>
<td>
<div>string4</div>
<div>
string5</div>
</td>
<td><div class="name">string6</div>
</td>
<td>
</td>
<td></td>
</tr>
<tr>
<td></td>
</table>
I want to extract strings ( string1
to string6
) with Beautiful soup. 我想用Beautiful汤提取字符串(从string1
到string6
)。
Can anyone answer me how to do this? 谁能回答我该怎么做?
** there are so many <div>
s in the rest of HTML and i don't need them all. ** HTML的其余部分中有很多<div>
,我不需要全部。 I want to extract strings between <td class="txt">
and </td>
我想提取<td class="txt">
和</td>
之间的字符串
If that is in the string html
, use 如果在字符串html
,请使用
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
print [t.text for t in soup.find("table", {"class": "list04"}).findAll("div")]
which will print out: 它将打印出:
[u'string1', u'string2', u'string3', u'string4', u'string5', u'string6']
Try this 尝试这个
from BeautifulSoup import BeautifulSoup
f = open('a.htm')
soup = BeautifulSoup(f)
anothersoup = BeautifulSoup(soup.findAll('td', attrs={'class':'txt'}))
list = anothersoup.findAll('div')
print list
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.