简体   繁体   中英

Can't find <div ng-view> from NBA stats website with BeautifulSoup Python

I'm trying to Scrape this NBA website https://stats.nba.com/team/1610612738/ . What I'm trying to do is to extract player's name, NO, POS and all the information for every player. The problem is that I can't find or my code can't find <div ng-view> that's the parent of <nba-stat-table > where the table is.

My code so far is:

from selenium import webdriver
from bs4 import BeautifulSoup

def get_Player():
    driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')

    url = 'https://stats.nba.com/team/1610612738/'

    driver.get(url)

    data = driver.page_source.encode('utf-8')

    soup = BeautifulSoup(data, 'lxml')

    div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
    print(div1.find_all('div'))

get_Player()

Use the json response endpoint which the page uses to get that content. Far easier and nicer to handle and no need for selenium. You can find it in the network tab.

import requests
import pandas as pd

r = requests.get('https://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2018-19&TeamID=1610612738',  headers = {'User-Agent' : 'Mozilla/5.0'}).json()
players_info = r['resultSets'][0]
df = pd.DataFrame(players_info['rowSet'], columns = players_info['headers'])
print(df.head())

在此处输入图片说明

find_all function always return a list, findChildren() is return all child of tag object, more details

Replace your code:

div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
print(div1.find_all('div')) 

To:

div = soup.find('div', {'class':"nba-stat-table__overflow"})
for tr in div.find("tbody").find_all("tr"):
    for td in tr.findChildren():
        print(td.text)

UPDATE:

from selenium import webdriver

from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def get_Player():
    driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')

    url = 'https://stats.nba.com/team/1610612738/'

    driver.get(url)

    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "nba-stat-table__overflow")))

    data = driver.page_source.encode('utf-8')

    soup = BeautifulSoup(data, 'lxml')

    div = soup.find('div', {'class':"nba-stat-table__overflow"})
    for tr in div.find("tbody").find_all("tr"):
        for td in tr.findChildren():
            print(td.text)

get_Player()

O/P:

Jayson Tatum
Jayson Tatum
#0
F
6-8
208 lbs
MAR 03, 1998
21
1
Duke
Jonathan Gibson
Jonathan Gibson
#3
G
6-2
185 lbs
NOV 08, 1987
31
2
New Mexico State
....

Why do you want to find all the div's , If it's just the Player name that you want to extract, you can use this css selector :

td.player a

Code :

all_player = driver.find_elements_by_css_selector('td.player a')
for playername in all_player:
   print(playername.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM