简体   繁体   English

Python刮NBA跟踪驱动器数据

[英]Python Scrape NBA Tracking Drives Data

I am fairly new to Python.我对 Python 还很陌生。 I am trying to scrape NBA Drives data via https://stats.nba.com/players/drives/我正在尝试通过https://stats.nba.com/players/drives/抓取 NBA Drives 数据

I used Chrome Devtools to find the API URL.我使用 Chrome Devtools 找到了 API URL。 I then used the requests package to get the JSON string.然后我使用请求 package 来获取 JSON 字符串。

Original code:原始代码:

import requests
headers = {"User-Agent": "Mozilla/5.0..."}
url = " https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight="
r = requests.get(url, headers = headers)
d = r.json()

This no longer works, however.但是,这不再有效。 For some reason the request for the URL link below times out on the NBA server.由于某种原因,下面的 URL 链接的请求在 NBA 服务器上超时。 So I need to find a new way to get this information.所以我需要找到一种新的方法来获取这些信息。

< https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight= > < https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location0=&MonthMode=0&OpponTeamID= =PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight= >

I was exploring Chrome devtools and I found out that the desired JSON string was stored in the Network XHR Response tab.我正在探索 Chrome 开发工具,发现所需的 JSON 字符串存储在 Network XHR Response 选项卡中。 Is there any way to scrape that into python.有没有办法把它刮到 python 中。 See the image below.见下图。

Chrome Devtools: XHR Response JSON string Chrome 开发工具:XHR 响应 JSON 字符串

I tested url with other headers (which I saw in DevTool for this request) and it seems it needs header Referer to work correctly我测试了 url 和其他头文件(我在DevTool中看到了这个请求),它似乎需要 header Referer才能正常工作

EDIT 2020.08.15:编辑 2020.08.15:

I had to add new headers to read it我不得不添加新的标题来阅读它

'x-nba-stats-origin': 'stats',
'x-nba-stats-token': 'true',

import requests

headers = {
    'User-Agent': 'Mozilla/5.0',
    #'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0',
    'Referer': 'https://stats.nba.com/players/drives/',
    #'Accept': 'application/json, text/plain, */*',

    'x-nba-stats-origin': 'stats',
    'x-nba-stats-token': 'true',
}

url = 'https://stats.nba.com/stats/leaguedashptstats?College=&Conference=&Country=&DateFrom=&DateTo=&Division=&DraftPick=&DraftYear=&GameScope=&Height=&LastNGames=0&LeagueID=00&Location=&Month=0&OpponentTeamID=0&Outcome=&PORound=0&PerMode=PerGame&PlayerExperience=&PlayerOrTeam=Player&PlayerPosition=&PtMeasureType=Drives&Season=2019-20&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&TeamID=0&VsConference=&VsDivision=&Weight='
r = requests.get(url, headers=headers)
data = r.json()

print(data)

BTW: the same but with params as dictionary so it is easier to set different value顺便说一句:相同但使用参数作为字典,因此更容易设置不同的值

import requests

headers = {
    'User-Agent': 'Mozilla/5.0',
    #'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0',
    'Referer': 'https://stats.nba.com/players/drives/',
    #'Accept': 'application/json, text/plain, */*',

    'x-nba-stats-origin': 'stats',
    'x-nba-stats-token': 'true',
}

url = 'https://stats.nba.com/stats/leaguedashptstats'

params = {
    'College': '',
    'Conference': '',
    'Country': '',
    'DateFrom': '',
    'DateTo': '',
    'Division': '',
    'DraftPick': '',
    'DraftYear': '',
    'GameScope': '',
    'Height': '',
    'LastNGames': '0',
    'LeagueID': '00',
    'Location': '',
    'Month': '0',
    'OpponentTeamID': '0',
    'Outcome': '',
    'PORound': '0',
    'PerMode': 'PerGame',
    'PlayerExperience': '',
    'PlayerOrTeam': 'Player',
    'PlayerPosition': '',
    'PtMeasureType': 'Drives',
    'Season': '2019-20',
    'SeasonSegment': '',
    'SeasonType': 'Regular Season',
    'StarterBench': '',
    'TeamID': '0',
    'VsConference': '',
    'VsDivision': '',
    'Weight': '',
}

r = requests.get(url, headers=headers, params=params)
#print(r.request.url)
data = r.json()

print(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM