Python 2.7：無法用BeautifulSoup4解析樹

Question

我正在嘗試解析此站點以創建5個列表，每天創建一個列表，並為每個公告填充一個字符串。 例如

[in]   custom_function(page)

[out]  [[<MONDAYS    ANNOUNCEMENTS>],
        [<TUESDAYS   ANNOUNCEMENTS>],
        [<WEDNESDAYS ANNOUNCEMENTS>],
        [<THURSDAYS  ANNOUNCEMENTS>],
        [<FRIDAYS    ANNOUNCEMENTS>]]

但是我不知道這樣做的正確方法。

這就是我到目前為止

from bs4 import BeautifulSoup
import requests
import datetime

url = http://mam.econoday.com/byweek.asp?day=7&month=4&year=2014&cust=mam&lid=0




# Get the text of the webpage
r               = requests.get(url)
data            = r.text
soup            = BeautifulSoup(data)


full_table_1 = soup.find('table', 'eventstable')

網站開發人員工具的ScreenShot

我想出了我想要的內容在突出顯示的標簽中，但是我不確定如何獲取確切的標簽，然后將時間/公告解析為一個列表。 我嘗試了多種方法，但它只會變得越來越混亂。

我該怎么辦？

Answer 1

這個想法是找到帶有events類的所有td元素，然后讀取里面的div元素：

data = []
for day in soup.find_all('td', class_='events'):
    data.append([div.text for div in day.find_all('div', class_='econoevents')])

print data

印刷品：

[[u'Gallup US Consumer Spending Measure8:30 AM\xa0ET',
  u'4-Week Bill Announcement11:00 AM\xa0ET',
  u'3-Month Bill Auction11:30 AM\xa0ET',
  ...
 ],
 ...
]

Python 2.7：無法用BeautifulSoup4解析樹

問題描述

1 個解決方案

解決方案1
0 已采納 2014-04-06 02:42:12

Python 2.7：無法用BeautifulSoup4解析樹

問題描述

1 個解決方案

解決方案1 0 已采納 2014-04-06 02:42:12

解決方案1
0 已采納 2014-04-06 02:42:12