简体   繁体   English

python 报废数据来自.aspx web 页面

[英]python scrap data from .aspx web page

I want to scarp the table from this webpage to pandas table: https://www.perfectgame.org/College/CollegePlayerReports.aspx我想将此网页中的表格转换为 pandas 表格: https://www.perfectgame.org/College/CollegePlayerReports.aspx

I've used both requests and request-HTML but both don't seem to be effective,我同时使用了请求和请求 HTML,但似乎都没有效果,

from requests_html import HTMLSession
from requests import *
from bs4 import BeautifulSoup
import pandas as pd

def get_stats( name, year ) :

    with HTMLSession() as s :
        source = 'https://www.perfectgame.org/College/CollegePlayerReports.aspx'
        response = s.get( source )
        table = response.html.find('table.Grid', first=True)
        df = pd.read_html( table.html, header = 0 ) [ 0 ]
        print( df )

any solutions?任何解决方案?

To get data from table into pandas dataframe you can use next example:要将表中的数据获取到 pandas dataframe 中,您可以使用下一个示例:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = "https://www.perfectgame.org/College/CollegePlayerReports.aspx"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

data = []
for row in soup.select("tbody tr.rgRow, tbody tr.rgAltRow"):
    data.append(row.get_text(strip=True, separator="|").split("|"))

df = pd.DataFrame(
    data,
    columns=["Reports", "Draft Eligible", "Class", "College", "Report Date"],
)
print(df.to_markdown(index=False))

Prints:印刷:

Reports报告 Draft Eligible符合草稿条件 Class Class College大学 Report Date报告日期
Drew Williamson德鲁威廉姆森 2022 2022 Senior高级的 Alabama阿拉巴马州 6/1/2022 2022 年 6 月 1 日
Caden Rose卡登玫瑰 2023 2023 Sophomore二年级 Alabama阿拉巴马州 6/1/2022 2022 年 6 月 1 日
Wyatt Langford怀亚特兰福德 2023 2023 Sophomore二年级 Florida佛罗里达 6/1/2022 2022 年 6 月 1 日
Nick Ficarrotta尼克·费卡罗塔 2022 2022 Freshman大一新生 Florida佛罗里达 6/1/2022 2022 年 6 月 1 日
Fisher Jameson费舍尔詹姆逊 2024 2024 Freshman大一新生 Florida佛罗里达 6/1/2022 2022 年 6 月 1 日

... ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用Selenium python只废弃网页中的电子邮件? - How To Scrap only emails from web page using selenium python? Python-从ASPX网页捕获自动下载文件 - Python - Capture auto-downloading file from aspx web page 如何使用 Python 从 .aspx 页面检索数据? - How to retrieve data from .aspx page using Python? python scrap chrome 网上商店评论 - python scrap chrome web-store comment 从特定的ASPP页面启动Web抓取工具 - Start web scraper from specific aspx page Python请求:登录Paytm来刮除商家数据 - Python requests: logging into paytm to scrap the merchant data Python/Selenium web 废弃如何从链接中找到隐藏的 src 值? - Python/Selenium web scrap how to find hidden src value from a links? 如何循环访问来自同一网站的多个 URL 以收集网络垃圾数据? - How can I loop though many URL's from the same website in order to web scrap data? Python selenium 使用 javascript 刷新的废弃页面,无法从新的 ZDE9B9ED78D7E2E919DCEEFFEE780E 页面获取元素 - Python selenium scrap pages that refresh with javascript, unable to get elements from the new javascript page Python Selenium Scraper:分页到下一页显示错误。 网站的报废保护? - Python Selenium Scraper: Pagination to next page shows error. Scrap Protection from Website?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM