简体   繁体   English

Python Pandas - read_html 未找到表格

[英]Python Pandas - read_html No tables Found

I am very new to python and trying to do my own data analysis.我是 python 的新手,正在尝试自己进行数据分析。

I am trying to parse data from this website: https://www.tsn.ca/nhl/statistics我正在尝试解析来自该网站的数据: https://www.tsn.ca/nhl/statistics

I wanted to get the table in a data frame format.我想以数据框格式获取表格。

I tried this:我试过这个:

import pandas as pd

players_list_unclean = pd.read_html('https://www.sport.net.ca/hockey/nhl/players/?season=2021&?seasonType=reg&tab=Skaters')

I get the following error:我收到以下错误:

raise ValueError("No tables found") ValueError: No tables found raise ValueError("No tables found") ValueError: 没有找到表

I can see there is table, but for some reason it is not being read.我可以看到有表格,但由于某种原因没有被读取。

I found another stack overflow solution recommending using selenium:我发现另一个堆栈溢出解决方案推荐使用 selenium:

pandas read_html ValueError: No tables found pandas read_html ValueError:找不到表

However, when I tried to implement this code I could not find the table ID in the html page source.但是,当我尝试实现此代码时,我无法在 html 页面源代码中找到表 ID。 Does anyone know another way to do this?有谁知道另一种方法来做到这一点? I have tried other websites, but I ultimately have the same issue.我试过其他网站,但我最终遇到了同样的问题。

from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("https://www.wunderground.com/personal-weather-station/dashboard?ID=KMAHADLE7#history/tdata/s20170201/e20170201/mcustom.html")
elem = driver.find_element_by_id("history_table")

head = elem.find_element_by_tag_name('thead')
body = elem.find_element_by_tag_name('tbody')

list_rows = []

for items in body.find_element_by_tag_name('tr'):
    list_cells = []
    for item in items.find_elements_by_tag_name('td'):
        list_cells.append(item.text)
    list_rows.append(list_cells)
driver.close() ```



If you right click the table and choose inspect, you will see that the "table" on that page is not actually using the html table element.如果您右键单击表格并选择检查,您将看到该页面上的“表格”实际上并未使用 html 表格元素。

From the Pandas documentation:来自 Pandas 文档:

This function searches for <table> elements and only for <tr> and <th> rows and <td> elements within each <tr> or <th> element in the table.此 function 搜索 <table> 元素,并且仅搜索表中每个 <tr> 或 <th> 元素中的 <tr> 和 <th> 行以及 <td> 元素。

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_html.html

I don't think this will work on this page.我认为这不会在此页面上工作。 Probably need to find another data source.可能需要找到另一个数据源。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM