Python BeautifulSoup HTML解析

Question

嗨，大家好，我有關於用BeautifulSoup解析HTML的問題，我的問題是如何解析此html：

<div class="time_table show_today" id="monday_schedule">
          <h3>January 20, 2014</h3>
                        <table>
                <tbody>
                <tr>
                  <th>Time</th>
                  <th>Program</th>
                </tr>

                    <tr>
                      <td class="time_part"> 0:00 </td>
                      <td class="show_content">
                        <h4>
                          First Up
                        </h4>
                        <p>
                          Bloomberg Television&#39;s award winning morning show takes a look at market openings in Asia and analyzes all the breaking news stories essential for your business day ahead.                        </p>
                      </td>
                    </tr>

                    <tr>
                      <td class="time_part"> 2:00 </td>
                      <td class="show_content">
                        <h4>
                          On the Move with Rishaad Salamat
                        </h4>
                        <p>
                          Rishaad Salamat brings you comprehensive coverage of market openings from Asia and live reporting on the stories most impacting business around the globe.                        </p>
                      </td>
                    </tr>

                    <tr>
                      <td class="time_part"> 4:00 </td>
                      <td class="show_content">
                        <h4>
                          Asia Edge
                        </h4>
                        <p>
                          Get to the bottom of the days major issues influencing business decisions with Rishaad Salamat. Asia Edge gives viewers a deeper perspective through extended interviews with the region&#39;s newsmakers as well as fast-paced panel discussions featuring Bloomberg&#39;s market reporters, business experts and influential guests. Stay ahead of the business day with Asia Edge.                        </p>
                      </td>
                    </tr>

我的代碼如下：

url = 'http://www.bloomberg.com/tv/schedule/europe/'

response = urllib2.urlopen(url)
soup = BeautifulSoup(response)

for line in soup.findAll('div',{'td','h4','p'}):
    print line

我在代碼中做錯了什么，一些建議會很棒。 問題是， <h3>January 20, 2014</h3大約要用一周的時間，而他只拿了一個標簽，但是循環不能做任何事情來打印所有其他標簽的標簽。

Answer 1

我不確定您要使用{'td','h4','p'}作為第二個參數來實現什么。 那是一個set ，而不是一個dict （就像您可能想的那樣）。

如果您想獲取日期，可以在這里使用簡單的soup.find('h3') ：

>>> print soup.find('h3')
<h3>January 20, 2014</h3>
>>> print soup.find('h3').text
January 20, 2014

Python BeautifulSoup HTML解析

問題描述

1 個解決方案

解決方案1
0 已采納 2014-01-23 09:07:42

Python BeautifulSoup HTML解析

問題描述

1 個解決方案

解決方案1 0 已采納 2014-01-23 09:07:42

解決方案1
0 已采納 2014-01-23 09:07:42