[英]Python 2.7 : Can't figure out how to parse a tree with BeautifulSoup4
I am trying to parse this site to create 5 lists, one for each day and filled with one string for each announcement. 我正在尝试解析此站点以创建5个列表,每天创建一个列表,并为每个公告填充一个字符串。 For example 例如
[in] custom_function(page)
[out] [[<MONDAYS ANNOUNCEMENTS>],
[<TUESDAYS ANNOUNCEMENTS>],
[<WEDNESDAYS ANNOUNCEMENTS>],
[<THURSDAYS ANNOUNCEMENTS>],
[<FRIDAYS ANNOUNCEMENTS>]]
But I can't figure out the correct way to do this. 但是我不知道这样做的正确方法。
This is what I have so far 这就是我到目前为止
from bs4 import BeautifulSoup
import requests
import datetime
url = http://mam.econoday.com/byweek.asp?day=7&month=4&year=2014&cust=mam&lid=0
# Get the text of the webpage
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
full_table_1 = soup.find('table', 'eventstable')
I Figured out that what I want is in the highlighted tag, but I'm not sure how to get to that exact tag and then parse out the times/announcements into a list. 我想出了我想要的内容在突出显示的标签中,但是我不确定如何获取确切的标签,然后将时间/公告解析为一个列表。 I've tried multiple methods but it just keeps getting messier. 我尝试了多种方法,但它只会变得越来越混乱。
What do I do? 我该怎么办?
The idea is to find all td
elements with events
class, then read div
elements inside: 这个想法是找到带有events
类的所有td
元素,然后读取里面的div
元素:
data = []
for day in soup.find_all('td', class_='events'):
data.append([div.text for div in day.find_all('div', class_='econoevents')])
print data
prints: 印刷品:
[[u'Gallup US Consumer Spending Measure8:30 AM\xa0ET',
u'4-Week Bill Announcement11:00 AM\xa0ET',
u'3-Month Bill Auction11:30 AM\xa0ET',
...
],
...
]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.