python 报废数据来自.aspx web 页面

Question

I want to scarp the table from this webpage to pandas table: https://www.perfectgame.org/College/CollegePlayerReports.aspx我想将此网页中的表格转换为 pandas 表格： https://www.perfectgame.org/College/CollegePlayerReports.aspx

I've used both requests and request-HTML but both don't seem to be effective,我同时使用了请求和请求 HTML，但似乎都没有效果，

from requests_html import HTMLSession
from requests import *
from bs4 import BeautifulSoup
import pandas as pd

def get_stats( name, year ) :

    with HTMLSession() as s :
        source = 'https://www.perfectgame.org/College/CollegePlayerReports.aspx'
        response = s.get( source )
        table = response.html.find('table.Grid', first=True)
        df = pd.read_html( table.html, header = 0 ) [ 0 ]
        print( df )

any solutions?任何解决方案？

Answer 1

To get data from table into pandas dataframe you can use next example:要将表中的数据获取到 pandas dataframe 中，您可以使用下一个示例：

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = "https://www.perfectgame.org/College/CollegePlayerReports.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = []
for row in soup.select("tbody tr.rgRow, tbody tr.rgAltRow"):
    data.append(row.get_text(strip=True, separator="|").split("|"))

df = pd.DataFrame(
    data,
    columns=["Reports", "Draft Eligible", "Class", "College", "Report Date"],
)
print(df.to_markdown(index=False))

Prints:印刷：

Reports报告	Draft Eligible符合草稿条件	Class Class	College大学	Report Date报告日期
Drew Williamson德鲁威廉姆森	2022 2022	Senior高级的	Alabama阿拉巴马州	6/1/2022 2022 年 6 月 1 日
Caden Rose卡登玫瑰	2023 2023	Sophomore二年级	Alabama阿拉巴马州	6/1/2022 2022 年 6 月 1 日
Wyatt Langford怀亚特兰福德	2023 2023	Sophomore二年级	Florida佛罗里达	6/1/2022 2022 年 6 月 1 日
Nick Ficarrotta尼克·费卡罗塔	2022 2022	Freshman大一新生	Florida佛罗里达	6/1/2022 2022 年 6 月 1 日
Fisher Jameson费舍尔詹姆逊	2024 2024	Freshman大一新生	Florida佛罗里达	6/1/2022 2022 年 6 月 1 日

... ...

python 报废数据来自.aspx web 页面

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-08-19 23:50:24

python 报废数据来自.aspx web 页面

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-08-19 23:50:24

解决方案1
1 已采纳 2022-08-19 23:50:24