簡體   English   中英

Python BeautifulSoup HTML解析

[英]Python BeautifulSoup HTML parse

嗨,大家好,我有關於用BeautifulSoup解析HTML的問題,我的問題是如何解析此html:

<div class="time_table show_today" id="monday_schedule">
          <h3>January 20, 2014</h3>
                        <table>
                <tbody>
                <tr>
                  <th>Time</th>
                  <th>Program</th>
                </tr>

                    <tr>
                      <td class="time_part"> 0:00 </td>
                      <td class="show_content">
                        <h4>
                          First Up
                        </h4>
                        <p>
                          Bloomberg Television&#39;s award winning morning show takes a look at market openings in Asia and analyzes all the breaking news stories essential for your business day ahead.                        </p>
                      </td>
                    </tr>

                    <tr>
                      <td class="time_part"> 2:00 </td>
                      <td class="show_content">
                        <h4>
                          On the Move with Rishaad Salamat
                        </h4>
                        <p>
                          Rishaad Salamat brings you comprehensive coverage of market openings from Asia and live reporting on the stories most impacting business around the globe.                        </p>
                      </td>
                    </tr>

                    <tr>
                      <td class="time_part"> 4:00 </td>
                      <td class="show_content">
                        <h4>
                          Asia Edge
                        </h4>
                        <p>
                          Get to the bottom of the days major issues influencing business decisions with Rishaad Salamat. Asia Edge gives viewers a deeper perspective through extended interviews with the region&#39;s newsmakers as well as fast-paced panel discussions featuring Bloomberg&#39;s market reporters, business experts and influential guests. Stay ahead of the business day with Asia Edge.                        </p>
                      </td>
                    </tr>

我的代碼如下:

url = 'http://www.bloomberg.com/tv/schedule/europe/'

response = urllib2.urlopen(url)
soup = BeautifulSoup(response)

for line in soup.findAll('div',{'td','h4','p'}):
    print line

我在代碼中做錯了什么,一些建議會很棒。 問題是, <h3>January 20, 2014</h3大約要用一周的時間,而他只拿了一個標簽,但是循環不能做任何事情來打印所有其他標簽的標簽。

我不確定您要使用{'td','h4','p'}作為第二個參數來實現什么。 那是一個set ,而不是一個dict (就像您可能想的那樣)。

如果您想獲取日期,可以在這里使用簡單的soup.find('h3')

>>> print soup.find('h3')
<h3>January 20, 2014</h3>
>>> print soup.find('h3').text
January 20, 2014

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM