简体   繁体   English

使用python从下拉菜单中抓取所有动态生成的数据的最佳方法

[英]Best way to scrape all the dynamically generated data from a drop down menu using python

I am creating webscraper that will scrape dynamically generated player data from this website我正在创建 webscraper,它将从这个网站上抓取动态生成的玩家数据

https://frozenpool.dobbersports.com/frozenpool_linecombo.phphttps://frozenpool.dobbersports.com/frozenpool_linecombo.php

I want to create a loop that will generate the data for a player from a drop-down menu, scrape the data, and then loop through the entire list of players in the drop down menu.我想创建一个循环,从下拉菜单中为玩家生成数据,抓取数据,然后在下拉菜单中遍历整个玩家列表。

I am curious if using selenium to interact with the site is the best way to do this.我很好奇使用 selenium 与网站交互是否是最好的方法。 But I have also noticed that the URL for every player follows a specific pattern, so I have considered scraping the initial page to collect all the data I need, then using that data to construct a list of URL's, then looping through that list of URL's and treating them as static pages.但我也注意到每个玩家的 URL 都遵循特定的模式,因此我考虑抓取初始页面以收集我需要的所有数据,然后使用该数据构建 URL 列表,然后遍历该 URL 列表并将它们视为静态页面。

Are there python tools that are built for this specific type of web scraping?是否有专为这种特定类型的网页抓取而构建的 Python 工具?

It looks like all the information you need is loaded in the request to https://frozenpool.dobbersports.com/frozenpool_linecombo.php .看起来您需要的所有信息都已加载到https://frozenpool.dobbersports.com/frozenpool_linecombo.php的请求中。 Because it doesn't make additional requests to get more information Selenium is probably a bit overkill for this project and you can stick with BeautifulSoup .因为它不会提出额外的请求来获取更多信息,所以Selenium对这个项目来说可能有点矫枉过正,你可以坚持使用BeautifulSoup

I think you are correct in your idea of looping through each player using the structure URL.我认为您使用结构 URL 遍历每个玩家的想法是正确的。 If this is a one off scrape then copy the list of players directly from the html of the page to get the player ids, then loop though the following URL, replacing PLAYER_ID with each players identifier.如果这是一次性抓取,则直接从页面的 html 复制玩家列表以获取玩家 ID,然后循环访问以下 URL,将PLAYER_ID替换为每个玩家标识符。

http://frozenpool.dobbersports.com/frozenpool_linecombo.php?select=F&forward= PLAYER_ID &games=2019-2020%3AR%3A99&period=ALL&situation=ALL http://frozenpool.dobbersports.com/frozenpool_linecombo.php?select=F&forward= PLAYER_ID和游戏= 2019 - 2020%3AR%3A99&期= ALL&情况= ALL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM