简体   繁体   English

Python 网页抓取与美丽的汤

[英]Python Web Scraping with Beautiful Soup

I'm trying to pull the entire table from the site below and store as a dataframe, but am hitting an error when attempting to pull all the headings.我正在尝试从下面的站点中提取整个表格并将其存储为数据框,但是在尝试提取所有标题时遇到错误。 It appears that the table has these attributes, so not sure why this is happening.该表似乎具有这些属性,因此不确定为什么会发生这种情况。

URL = "http://www.ercot.com/content/cdr/html/real_time_spp"
page = requests.get(URL).text
soup = BeautifulSoup(page, "lxml")

table = soup.find("table", attrs={"class": "tableStyle"})
table_data = table.tbody.find_all("tr")


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-241-362ee5fb0444> in <module>
      1 table = soup.find("table", attrs={"class": "tableStyle"})
----> 2 table_data = table.tbody.find_all("tr")

AttributeError: 'NoneType' object has no attribute 'find_all'

The HTML for that page doesn't have a tbody element, which is why table.tbody is None .该页面的 HTML 没有tbody元素,这就是table.tbodyNone

You can get all the rows directly from the table using:您可以使用以下命令直接从表中获取所有行:

table = soup.find("table", attrs={"class": "tableStyle"})
table_data = table.findAll('tr')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM