[英]python scrap data from .aspx web page
I want to scarp the table from this webpage to pandas table: https://www.perfectgame.org/College/CollegePlayerReports.aspx我想将此网页中的表格转换为 pandas 表格: https://www.perfectgame.org/College/CollegePlayerReports.aspx
I've used both requests and request-HTML but both don't seem to be effective,我同时使用了请求和请求 HTML,但似乎都没有效果,
from requests_html import HTMLSession
from requests import *
from bs4 import BeautifulSoup
import pandas as pd
def get_stats( name, year ) :
with HTMLSession() as s :
source = 'https://www.perfectgame.org/College/CollegePlayerReports.aspx'
response = s.get( source )
table = response.html.find('table.Grid', first=True)
df = pd.read_html( table.html, header = 0 ) [ 0 ]
print( df )
any solutions?任何解决方案?
To get data from table into pandas dataframe you can use next example:要将表中的数据获取到 pandas dataframe 中,您可以使用下一个示例:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://www.perfectgame.org/College/CollegePlayerReports.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
data = []
for row in soup.select("tbody tr.rgRow, tbody tr.rgAltRow"):
data.append(row.get_text(strip=True, separator="|").split("|"))
df = pd.DataFrame(
data,
columns=["Reports", "Draft Eligible", "Class", "College", "Report Date"],
)
print(df.to_markdown(index=False))
Prints:印刷:
Reports![]() |
Draft Eligible![]() |
Class ![]() |
College![]() |
Report Date![]() |
---|---|---|---|---|
Drew Williamson![]() |
2022 ![]() |
Senior![]() |
Alabama![]() |
6/1/2022 ![]() |
Caden Rose![]() |
2023 ![]() |
Sophomore![]() |
Alabama![]() |
6/1/2022 ![]() |
Wyatt Langford![]() |
2023 ![]() |
Sophomore![]() |
Florida![]() |
6/1/2022 ![]() |
Nick Ficarrotta![]() |
2022 ![]() |
Freshman![]() |
Florida![]() |
6/1/2022 ![]() |
Fisher Jameson![]() |
2024 ![]() |
Freshman![]() |
Florida![]() |
6/1/2022 ![]() |
... ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.