简体   繁体   English

Python 2.7:无法用BeautifulSoup4解析树

[英]Python 2.7 : Can't figure out how to parse a tree with BeautifulSoup4

I am trying to parse this site to create 5 lists, one for each day and filled with one string for each announcement. 我正在尝试解析此站点以创建5个列表,每天创建一个列表,并为每个公告填充一个字符串。 For example 例如

[in]   custom_function(page)

[out]  [[<MONDAYS    ANNOUNCEMENTS>],
        [<TUESDAYS   ANNOUNCEMENTS>],
        [<WEDNESDAYS ANNOUNCEMENTS>],
        [<THURSDAYS  ANNOUNCEMENTS>],
        [<FRIDAYS    ANNOUNCEMENTS>]]

But I can't figure out the correct way to do this. 但是我不知道这样做的正确方法。

This is what I have so far 这就是我到目前为止

from bs4 import BeautifulSoup
import requests
import datetime

url = http://mam.econoday.com/byweek.asp?day=7&month=4&year=2014&cust=mam&lid=0




# Get the text of the webpage
r               = requests.get(url)
data            = r.text
soup            = BeautifulSoup(data)


full_table_1 = soup.find('table', 'eventstable')

网站开发人员工具的ScreenShot

I Figured out that what I want is in the highlighted tag, but I'm not sure how to get to that exact tag and then parse out the times/announcements into a list. 我想出了我想要的内容在突出显示的标签中,但是我不确定如何获取确切的标签,然后将时间/公告解析为一个列表。 I've tried multiple methods but it just keeps getting messier. 我尝试了多种方法,但它只会变得越来越混乱。

What do I do? 我该怎么办?

The idea is to find all td elements with events class, then read div elements inside: 这个想法是找到带有events类的所有td元素,然后读取里面的div元素:

data = []
for day in soup.find_all('td', class_='events'):
    data.append([div.text for div in day.find_all('div', class_='econoevents')])

print data

prints: 印刷品:

[[u'Gallup US Consumer Spending Measure8:30 AM\xa0ET',
  u'4-Week Bill Announcement11:00 AM\xa0ET',
  u'3-Month Bill Auction11:30 AM\xa0ET',
  ...
 ],
 ...
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python 2.7-无法弄清楚如何使用模拟进行测试 - python 2.7 - Can't figure out how to test with mock 如何使用BeautifulSoup4解析此HTML? - How can I parse this HTML with BeautifulSoup4? 无法弄清楚为什么代码在Python 3中工作,而不是2.7 - Can't figure out why code working in Python 3 , but not 2.7 无法弄清楚如何在pyspark 1.6和python2.7中使用LinearRegression - Can't figure out how to use LinearRegression at pyspark 1.6 & python2.7 不知道如何下载正确的库或开始使用 beautifulsoup 进行 Python 网页抓取? - Can't figure out how to download the proper libraries or begin using beautifulsoup for python web scraping? 无法弄清楚如何使用 beautifulsoup 抓取 ID - Can't figure out how to scrape an ID with beautifulsoup 不能使用BeautifulSoup4刮掉特定的表(Python 3) - Can't Scrape a Specific Table using BeautifulSoup4 (Python 3) 无法弄清楚此 HTML 的 BeautifulSoup 命令 - Can't figure out BeautifulSoup Command for this HTML 如何在还安装了Python 3.4的情况下为Python 2.7安装BeautifulSoup4? - How to install BeautifulSoup4 for Python 2.7 while also having Python 3.4 installed? 我该如何抓取这些数据? [带Python的BeautifulSoup4] - How can I scrape this data? [BeautifulSoup4 with Python]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM