简体   繁体   English

使用python beautifulsoup刮取NBA高级统计数据

[英]Scraping NBA advanced stats with python beautifulsoup

I want to scrape the NBA advanced stats. 我想抓住NBA高级统计数据。 To start I just want to be able to scrape the names of the teams and I am having an issue where it is not collecting any of the information. 首先,我只是希望能够抓住团队的名字,我遇到的问题是它没有收集任何信息。 I may be looking for the wrong thing in the find_all function. 我可能在find_all函数中寻找错误的东西。 Any help is appreciated! 任何帮助表示赞赏!

 import requests from bs4 import BeautifulSoup url = "https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1" result = requests.get(url) c = result.content soup = Beaut ifulSoup(c,"html.parser") title = soup.title.text print(title) teams = soup.find_all('td',{'class':'team'}) for element in teams: print(element.text) 

Site that I want to scrape: 我要抓的网站:

我要抓的网站

Another way to do this is to send a get request to the site API and receive a json response. 另一种方法是向站点API发送get请求并接收json响应。 By alter the params you can get different results. 通过改变参数,你可以得到不同的结果。

You can look for where the request was sent to by your browser under the chrome developer tool. 您可以在Chrome开发人员工具下查找浏览器发送请求的位置。

import requests

url = "https://stats.nba.com/stats/leaguedashptstats?"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36"
}

params = {
    "PerMode": "PerGame",
    "PlayerOrTeam": "Team",
    "PtMeasureType": "ElbowTouch",
    "Season": "2018-19",
    "SeasonType": "Regular Season",
    "StarterBench": "",
    "PlayerPosition": "",
    "PlayerExperience": "",
    "GameScope": "",
    "VsConference": "",
    "VsDivision": "",
    "DateFrom": "",
    "DateTo": "",
    "SeasonSegment": "",
    "Location": "",
    "Outcome": "",
    "LastNGames": "0",
    "Month": "0",
    "OpponentTeamID": "0"
}

r = requests.get(url, params=params, headers=headers)
data = r.json()
results = data['resultSets'][0]['rowSet']

for result in results:
    print(result)

The site is dynamic, thus, you need to use selenium : 该网站是动态的,因此,您需要使用selenium

from selenium import webdriver
from bs4 import BeautifulSoup as soup 
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://stats.nba.com/teams/elbow-touch/?sort=ELBOW_TOUCHES&dir=-1')
s = soup(d.page_source, 'html.parser').find('table', {'class':'table'})
headers, [_, *data] = [i.text for i in s.find_all('th')], [[i.text for i in b.find_all('td')] for b in s.find_all('tr')]
final_data = [i for i in data if len(i) > 1]

Now, final_data stores the all team results: 现在, final_data存储了所有团队结果:

[['Houston Rockets', '63', '38', '25', '242.0', '367.0', '8.8', '2.4', '3.8', '64.2', '0.4', '0.7', '62.8', '5.5', '-', '3.7', '-', '0.5', '14.0', '0.5', '5.4', '0.3', '-'], ['Milwaukee Bucks', '63', '48', '15', '241.2', '409.5', '9.5', '2.3', '3.6', '62.4', '0.7', '1.0', '73.3', '5.4', '-', '4.3', '-', '0.6', '13.0', '0.5', '5.2', '0.4', '-'], ['New York Knicks', '62', '13', '49', '241.6', '420.4', '9.5', '2.0', '3.4', '56.8', '0.7', '1.0', '69.8', '4.8', '-', '4.7', '-', '0.6', '13.7', '0.5', '5.3', '0.5', '-'], ['Charlotte Hornets', '63', '29', '34', '242.0', '409.7', '9.6', '1.7', '3.5', '50.0', '1.1', '1.5', '71.9', '4.7', '-', '4.6', '-', '0.7', '14.2', '0.4', '4.5', '0.7', '-'], ['Detroit Pistons', '62', '31', '31', '242.8', '437.0', '10.0', '1.6', '3.2', '51.3', '0.9', '1.2', '75.3', '4.4', '-', '5.0', '-', '0.9', '17.6', '0.7', '6.8', '0.6', '-'], ['Washington Wizards', '62', '25', '37', '243.2', '420.2', '10.5', '2.5', '4.3', '58.4', '0.9', '1.2', '76.4', '6.1', '-', '4.6', '-', '0.7', '15.5', '0.6', '5.6', '0.5', '-'], ['Atlanta Hawks', '64', '22', '42', '242.3', '434.9', '11.0', '2.2', '3.7', '58.6', '1.2', '1.5', '77.3', '5.7', '-', '5.3', '-', '0.7', '12.9', '0.7', '6.5', '0.7', '-'], ['Brooklyn Nets', '65', '32', '33', '243.8', '440.3', '11.2', '2.5', '4.4', '58.3', '1.2', '1.7', '70.8', '6.4', '-', '4.6', '-', '0.7', '14.9', '0.9', '7.9', '0.8', '-'], ['San Antonio Spurs', '64', '35', '29', '241.6', '402.3', '11.3', '2.3', '4.1', '55.5', '0.8', '1.0', '85.7', '5.6', '-', '5.8', '-', '1.1', '18.7', '0.5', '4.8', '0.4', '-'], ['Boston Celtics', '64', '38', '26', '241.6', '420.8', '11.5', '2.5', '4.2', '58.4', '0.5', '0.7', '71.7', '5.5', '-', '5.7', '-', '0.9', '15.0', '0.6', '5.6', '0.3', '-'], ['Toronto Raptors', '64', '46', '18', '242.3', '418.0', '11.5', '3.5', '5.9', '59.6', '1.2', '1.5', '78.1', '8.3', '-', '4.1', '-', '0.7', '16.3', '0.4', '3.7', '0.7', '-'], ['Portland Trail Blazers', '63', '39', '24', '241.6', '409.8', '11.8', '2.4', '4.6', '51.9', '1.2', '1.5', '80.2', '6.1', '-', '5.5', '-', '1.0', '18.8', '0.7', '5.7', '0.7', '-'], ['Utah Jazz', '61', '36', '25', '240.8', '435.9', '11.9', '2.0', '3.8', '51.1', '1.4', '2.2', '66.7', '5.4', '-', '5.9', '-', '1.0', '17.1', '0.7', '5.9', '1.0', '-'], ['Minnesota Timberwolves', '63', '29', '34', '241.6', '412.4', '12.0', '2.9', '5.0', '57.3', '1.3', '1.6', '79.8', '7.3', '-', '5.2', '-', '1.0', '19.5', '0.6', '5.2', '0.7', '-'], ['Chicago Bulls', '63', '18', '45', '243.2', '411.3', '12.4', '2.8', '4.8', '57.9', '0.7', '0.9', '77.6', '6.4', '-', '6.3', '-', '0.8', '12.4', '0.6', '4.5', '0.4', '-'], ['LA Clippers', '65', '36', '29', '241.9', '430.4', '12.4', '2.9', '5.1', '56.9', '1.0', '1.5', '69.5', '7.0', '-', '5.4', '-', '0.9', '15.9', '0.7', '5.5', '0.6', '-'], ['Miami Heat', '62', '28', '34', '240.4', '426.1', '12.6', '2.0', '4.0', '50.2', '0.7', '1.3', '56.8', '4.9', '-', '7.0', '-', '1.1', '15.4', '0.4', '3.4', '0.5', '-'], ['New Orleans Pelicans', '65', '29', '36', '240.0', '435.0', '12.6', '3.5', '6.4', '54.8', '1.2', '1.6', '74.5', '8.4', '-', '4.4', '-', '0.9', '20.4', '0.7', '5.2', '0.8', '-'], ['Phoenix Suns', '64', '13', '51', '242.3', '435.8', '12.9', '2.8', '5.0', '56.7', '1.0', '1.3', '73.5', '6.8', '-', '6.2', '-', '0.8', '13.7', '0.6', '4.7', '0.6', '-'], ['Oklahoma City Thunder', '63', '39', '24', '242.0', '364.8', '13.6', '3.2', '5.8', '54.5', '1.0', '1.4', '65.9', '7.5', '-', '5.8', '-', '0.9', '14.7', '0.7', '4.8', '0.6', '-'], ['Dallas Mavericks', '62', '27', '35', '240.8', '435.4', '13.9', '1.8', '3.1', '55.9', '1.2', '1.6', '76.5', '5.0', '-', '8.6', '-', '1.1', '13.1', '0.8', '5.7', '0.7', '-'], ['Golden State Warriors', '63', '44', '19', '241.6', '442.3', '13.9', '2.8', '4.8', '57.0', '1.2', '1.5', '81.7', '6.9', '-', '7.2', '-', '1.6', '21.7', '0.8', '5.8', '0.7', '-'], ['Orlando Magic', '63', '28', '35', '241.2', '405.0', '14.0', '3.2', '5.7', '55.8', '1.1', '1.4', '80.9', '7.7', '-', '6.5', '-', '1.4', '21.8', '0.6', '4.0', '0.7', '-'], ['Los Angeles Lakers', '63', '30', '33', '241.6', '405.9', '14.2', '3.3', '5.7', '57.8', '1.1', '1.6', '67.0', '7.8', '-', '6.3', '-', '1.3', '20.7', '0.9', '6.3', '0.7', '-'], ['Denver Nuggets', '62', '42', '20', '240.8', '435.2', '15.0', '3.1', '5.3', '59.1', '1.1', '1.5', '72.5', '7.5', '-', '7.4', '-', '1.7', '22.3', '1.0', '6.4', '0.7', '-'], ['Indiana Pacers', '64', '41', '23', '240.4', '431.7', '15.3', '4.4', '7.2', '60.6', '1.4', '1.9', '74.2', '10.4', '-', '5.8', '-', '1.2', '20.9', '0.9', '6.0', '0.9', '-'], ['Cleveland Cavaliers', '64', '16', '48', '241.2', '407.3', '16.1', '2.3', '4.5', '51.6', '0.9', '1.1', '80.0', '5.6', '-', '10.0', '-', '1.2', '12.3', '0.5', '3.4', '0.4', '-'], ['Philadelphia 76ers', '63', '40', '23', '242.0', '446.9', '16.6', '2.5', '4.7', '52.7', '1.4', '1.7', '82.6', '6.6', '-', '9.6', '-', '1.8', '18.6', '0.7', '4.3', '0.7', '-'], ['Sacramento Kings', '62', '31', '31', '240.8', '425.2', '16.7', '3.2', '6.3', '50.3', '1.1', '1.6', '65.3', '7.5', '-', '8.0', '-', '1.5', '18.3', '1.0', '6.2', '0.7', '-'], ['Memphis Grizzlies', '65', '25', '40', '241.9', '452.1', '20.5', '3.4', '6.7', '51.3', '1.5', '1.9', '81.1', '8.6', '-', '11.2', '-', '1.6', '14.1', '0.8', '4.1', '0.8', '-']]

To get just the teams: 要获得团队:

teams = [a for a, *_ in final_data]

Output: 输出:

['Houston Rockets', 'Milwaukee Bucks', 'New York Knicks', 'Charlotte Hornets', 'Detroit Pistons', 'Washington Wizards', 'Atlanta Hawks', 'Brooklyn Nets', 'San Antonio Spurs', 'Boston Celtics', 'Toronto Raptors', 'Portland Trail Blazers', 'Utah Jazz', 'Minnesota Timberwolves', 'Chicago Bulls', 'LA Clippers', 'Miami Heat', 'New Orleans Pelicans', 'Phoenix Suns', 'Oklahoma City Thunder', 'Dallas Mavericks', 'Golden State Warriors', 'Orlando Magic', 'Los Angeles Lakers', 'Denver Nuggets', 'Indiana Pacers', 'Cleveland Cavaliers', 'Philadelphia 76ers', 'Sacramento Kings', 'Memphis Grizzlies']

To get specific statistics, it is easiest to create a list of dictionaries by binding the header values to the data lists: 要获取特定统计信息,最简单的方法是通过将标头值绑定到数据列表来创建字典列表:

data_attrs = [dict(zip(headers, i)) for i in final_data]
all_touches = [i['Touches'] for i in data_attrs]

A variation on @Ajax1234's answer can get you the whole table loaded into a dataframe: @ Ajax1234答案的变体可以让您将整个表加载到数据帧中:

import pandas as pd

pd.read_html(str(s))

And there's your table. 还有你的桌子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM