[英]Python BeautifulSoup HTML parse
嗨,大家好,我有關於用BeautifulSoup解析HTML的問題,我的問題是如何解析此html:
<div class="time_table show_today" id="monday_schedule">
<h3>January 20, 2014</h3>
<table>
<tbody>
<tr>
<th>Time</th>
<th>Program</th>
</tr>
<tr>
<td class="time_part"> 0:00 </td>
<td class="show_content">
<h4>
First Up
</h4>
<p>
Bloomberg Television's award winning morning show takes a look at market openings in Asia and analyzes all the breaking news stories essential for your business day ahead. </p>
</td>
</tr>
<tr>
<td class="time_part"> 2:00 </td>
<td class="show_content">
<h4>
On the Move with Rishaad Salamat
</h4>
<p>
Rishaad Salamat brings you comprehensive coverage of market openings from Asia and live reporting on the stories most impacting business around the globe. </p>
</td>
</tr>
<tr>
<td class="time_part"> 4:00 </td>
<td class="show_content">
<h4>
Asia Edge
</h4>
<p>
Get to the bottom of the days major issues influencing business decisions with Rishaad Salamat. Asia Edge gives viewers a deeper perspective through extended interviews with the region's newsmakers as well as fast-paced panel discussions featuring Bloomberg's market reporters, business experts and influential guests. Stay ahead of the business day with Asia Edge. </p>
</td>
</tr>
我的代碼如下:
url = 'http://www.bloomberg.com/tv/schedule/europe/'
response = urllib2.urlopen(url)
soup = BeautifulSoup(response)
for line in soup.findAll('div',{'td','h4','p'}):
print line
我在代碼中做錯了什么,一些建議會很棒。 問題是, <h3>January 20, 2014</h3
大約要用一周的時間,而他只拿了一個標簽,但是循環不能做任何事情來打印所有其他標簽的標簽。
我不確定您要使用{'td','h4','p'}
作為第二個參數來實現什么。 那是一個set
,而不是一個dict
(就像您可能想的那樣)。
如果您想獲取日期,可以在這里使用簡單的soup.find('h3')
:
>>> print soup.find('h3')
<h3>January 20, 2014</h3>
>>> print soup.find('h3').text
January 20, 2014
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.