简体   繁体   English

抓取具有表格但表格上的下一步按钮不会更改 url 的网站

[英]Scraping a website which has a table but the next button on the table doesn't change the url

I wanted to scrape this link and get the whole table of players:- https://www.nba.com/stats/leaders/?StatCategory=FG3M&PerMode=Totals&Season=2015-16&SeasonType=Regular%20Season我想抓取此链接并获取整个玩家表:- https://www.nba.com/stats/leaders/?StatCategory=FG3M&PerMode=Totals&Season=2015-16&SeasonType=Regular%20Season

Here, if you click on the next button in the table, the contents of the table changes but the url on the top doesn't change.在这里,如果您单击表格中的下一步按钮,表格的内容会发生变化,但顶部的 url 不会改变。 But the button doesn't have a button tag.但是该按钮没有按钮标签。 It looks like this:-它看起来像这样:-

<a class="stats-table-pagination__next" href="" alt="Next Page" ng-click="nav(1)">       
    <i class="fa fa-angle-right" aria-hidden="true"></i>
</a>

I tried using beautiful soup and selenium to scrape this website but I can't figure out how to navigate to other pages of the table so that I can scrape them too.我尝试使用漂亮的汤和 selenium 来抓取这个网站,但我不知道如何导航到表格的其他页面,以便我也可以抓取它们。 Please suggest a solution.请提出解决方案。

  1. You can use use google chrome in developer mode and find that json file containing all the data from image that you can see您可以在开发人员模式下使用谷歌浏览器并找到 json 文件,其中包含您可以看到的图像中的所有数据

  2. Then go to Network tab and refresh link and go to xhr tab you will find lots of link from that one link contains players information然后 go 到 Network 选项卡并刷新链接和 go 到 xhr 选项卡你会发现很多链接从一个链接包含球员信息

  3. after getting that exact data click on that link copy address and use requests module get json data and extract the information获得确切数据后,单击该链接复制地址并使用requests模块获取 json 数据并提取信息

import requests res=requests.get("https://stats.nba.com/stats/leagueLeaders?LeagueID=00&PerMode=Totals&Scope=S&Season=2015-16&SeasonType=Regular+Season&StatCategory=FG3M") data=res.json() for i in range(len(data['resultSet']['rowSet'])): print(data['resultSet']['rowSet'][i][2])

Output: Output:

Stephen Curry
Klay Thompson
James Harden
Damian Lillard
..

Image:图片:

在此处输入图像描述

从具有<div tag< div><div id="text_translate"><p> 我正在寻找使用 BeautifulSoup 从网站( <a href="https://datagolf.org/performance-table" rel="nofollow noreferrer">https://datagolf.org/performance-table</a> )中提取动态表。 但是,当我使用soup.find()命令查找表的源代码时,output 上什么也没有出现。 这是我正在使用的代码:</p><pre> url = 'https://datagolf.org/performance-table' headers = {'User-Agent': 'Mozilla/5.0'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.content, 'html.parser') box = soup.find('div', {'class': 'table-div'}) box</pre><p> 上述代码的 output 显示:</p><pre> &lt;div class="table-div"&gt; &lt;/div&gt;</pre><p> 当我将 class 更改为class_='table'时,output 显示空白。 对这里发生的事情有任何想法吗? 难道是我在调用不正确的源代码?</p></div></div> - scraping/identifying a table from a website that has <div tag

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 单击“下一页”按钮时,抓取 URL 不会更改的网站 - Scraping a website that URL doesn't change when clicking on "next page" button 无法从网站获取数据,因为获取数据时 URL 不会更改,因此数据表为空 - Cannot fetch data from the website as the URL doesn't change when it grabs the data so data table is empty Url 在用 selenium/python 点击“下一步”按钮后没有改变 - Url doesn't change after hitting the "Next" button with selenium/python 抓取具有“加载更多”按钮的网站不会返回带有 Beautiful Soup 和 Selenium 的新加载项目的信息 - Scraping a website that has a "Load more" button doesn't return info of newly loaded items with Beautiful Soup and Selenium 为包含 _dopostback 方法的多个页面抓取网站,并且页面的 URL 不会更改 - Scraping a website for multiple pages that contains _dopostback method and the URL doesn't change for the pages 从具有<div tag< div><div id="text_translate"><p> 我正在寻找使用 BeautifulSoup 从网站( <a href="https://datagolf.org/performance-table" rel="nofollow noreferrer">https://datagolf.org/performance-table</a> )中提取动态表。 但是,当我使用soup.find()命令查找表的源代码时,output 上什么也没有出现。 这是我正在使用的代码:</p><pre> url = 'https://datagolf.org/performance-table' headers = {'User-Agent': 'Mozilla/5.0'} response = requests.get(url, headers=headers) soup = BeautifulSoup(response.content, 'html.parser') box = soup.find('div', {'class': 'table-div'}) box</pre><p> 上述代码的 output 显示:</p><pre> &lt;div class="table-div"&gt; &lt;/div&gt;</pre><p> 当我将 class 更改为class_='table'时,output 显示空白。 对这里发生的事情有任何想法吗? 难道是我在调用不正确的源代码?</p></div></div> - scraping/identifying a table from a website that has <div tag 在表格中抓取 URL 个链接 - Scraping URL links in a table 当 url 不随 selenium 改变时循环遍历表行 - Looping through table rows when the url doesn't change with selenium Web 抓取没有锚标签或按钮的特定表, - Web Scraping a specific table that has no anchor tag or a button, 为什么我在抓取网站时找不到表格? - Why can't I find table when scraping website?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM