简体   繁体   English

找不到 <div ng-view> NBA stats网站上的BeautifulSoup Python

[英]Can't find <div ng-view> from NBA stats website with BeautifulSoup Python

I'm trying to Scrape this NBA website https://stats.nba.com/team/1610612738/ . 我正在尝试擦除此NBA网站https://stats.nba.com/team/1610612738/ What I'm trying to do is to extract player's name, NO, POS and all the information for every player. 我想做的是为每个玩家提取玩家的姓名,NO,POS和所有信息。 The problem is that I can't find or my code can't find <div ng-view> that's the parent of <nba-stat-table > where the table is. 问题是我找不到或我的代码找不到<nba-stat-table >所在的<nba-stat-table >的父<div ng-view>

My code so far is: 到目前为止,我的代码是:

from selenium import webdriver
from bs4 import BeautifulSoup

def get_Player():
    driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')

    url = 'https://stats.nba.com/team/1610612738/'

    driver.get(url)

    data = driver.page_source.encode('utf-8')

    soup = BeautifulSoup(data, 'lxml')

    div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
    print(div1.find_all('div'))

get_Player()

Use the json response endpoint which the page uses to get that content. 使用页面用于获取该内容的json响应端点。 Far easier and nicer to handle and no need for selenium. 更容易,更轻松地处理,并且不需要硒。 You can find it in the network tab. 您可以在“网络”标签中找到它。

import requests
import pandas as pd

r = requests.get('https://stats.nba.com/stats/commonteamroster?LeagueID=00&Season=2018-19&TeamID=1610612738',  headers = {'User-Agent' : 'Mozilla/5.0'}).json()
players_info = r['resultSets'][0]
df = pd.DataFrame(players_info['rowSet'], columns = players_info['headers'])
print(df.head())

在此处输入图片说明

find_all function always return a list, findChildren() is return all child of tag object, more details find_all函数总是返回一个列表, findChildren()返回标记对象的所有子对象, 更多详细信息

Replace your code: 替换您的代码:

div1 = soup.find('div', class_="columns / small-12 / section-view-overlay")
print(div1.find_all('div')) 

To: 至:

div = soup.find('div', {'class':"nba-stat-table__overflow"})
for tr in div.find("tbody").find_all("tr"):
    for td in tr.findChildren():
        print(td.text)

UPDATE: 更新:

from selenium import webdriver

from bs4 import BeautifulSoup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def get_Player():
    driver = webdriver.PhantomJS(executable_path=r'D:\Documents\Python\Web Scraping\phantomjs.exe')

    url = 'https://stats.nba.com/team/1610612738/'

    driver.get(url)

    WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "nba-stat-table__overflow")))

    data = driver.page_source.encode('utf-8')

    soup = BeautifulSoup(data, 'lxml')

    div = soup.find('div', {'class':"nba-stat-table__overflow"})
    for tr in div.find("tbody").find_all("tr"):
        for td in tr.findChildren():
            print(td.text)

get_Player()

O/P: O / P:

Jayson Tatum
Jayson Tatum
#0
F
6-8
208 lbs
MAR 03, 1998
21
1
Duke
Jonathan Gibson
Jonathan Gibson
#3
G
6-2
185 lbs
NOV 08, 1987
31
2
New Mexico State
....

Why do you want to find all the div's , If it's just the Player name that you want to extract, you can use this css selector : 为什么要查找所有的div's ,如果只是要提取的Player 名称 ,则可以使用以下css selector

td.player a

Code : 代码

all_player = driver.find_elements_by_css_selector('td.player a')
for playername in all_player:
   print(playername.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM