简体   繁体   English

BS4在Python中找不到元素

[英]BS4 Not Locating Element in Python

I am somewhat new to Python and can't for the life of me figure out why the following code isn't pulling the element I am trying to get. 我对Python有点陌生,无法终生了解为什么以下代码没有提取我要获取的元素。

It currently returns: 当前返回:

for player in all_players:

    player_first, player_last = player.split()
    player_first = player_first.lower()
    player_last = player_last.lower()
    first_name_letters = player_first[:2]
    last_name_letters = player_last[:5]

    player_url_code = '/{}/{}{}01'.format(last_name_letters[0], last_name_letters, first_name_letters)
    player_url = 'https://www.basketball-reference.com/players' + player_url_code + '.html'
    print(player_url) #test
    req = urlopen(player_url)
    soup = bs.BeautifulSoup(req, 'lxml')
    wrapper = soup.find('div', id='all_advanced_pbp')
    table = wrapper.find('div', class_='table_outer_container')


    for td in table.find_all('td'):
        player_pbp_data.append(td.get_text())

Currently returning: 当前返回:

--> for td in table.find_all('td'):
        player_pbp_data.append(td.get_text()) #if this works, would like to 

AttributeError: 'NoneType' object has no attribute 'find_all'

Note: iterating through children of the wrapper object returns: 注意:遍历包装对象的子对象将返回:

< div class="table_outer_container" > as part of the tree. < div class="table_outer_container" >作为树的一部分。

Thanks! 谢谢!

尝试改为显式传递html:

bs.BeautifulSoup(the_html, 'html.parser')

Make sure that table contains the data you expect. 确保该table包含您期望的数据。

For example https://www.basketball-reference.com/players/a/abdulka01.html doesn't seem to contain a div with id='all_advanced_pbp' 例如, https: id='all_advanced_pbp'似乎没有包含id='all_advanced_pbp'div

I trie to extract data from the url you gave but it did not get full DOM. 我试图从您提供的URL中提取数据,但没有获得完整的DOM。 after then i try to access the page with browser with javascrip and without javascrip, i know website need javascrip to load some data. 之后,我尝试使用带有javascrip且没有javascrip的浏览器访问页面,我知道网站需要javascrip来加载一些数据。 But the page like players it need not. 但是页面不需要players The simple way to get dynamic data is using selenium 获取动态数据的简单方法是使用硒

This is my test code 这是我的测试代码

import requests
from bs4 import BeautifulSoup
from selenium import webdriver

player_pbp_data = []

def get_list(t="a"):
    with requests.Session() as se:
        url = "https://www.basketball-reference.com/players/{}/".format(t)
        req = se.get(url)
        soup = BeautifulSoup(req.text,"lxml")
        with open("a.html","wb") as f:
            f.write(req.text.encode())
        table = soup.find("div",class_="table_wrapper setup_long long")
        players = {player.a.text:"https://www.basketball-reference.com"+player.a["href"] for player in table.find_all("th",class_="left ")}


def get_each_player(player_url="https://www.basketball-reference.com/players/a/abdulta01.html"):

    with webdriver.Chrome() as ph:
        ph.get(player_url)
        text = ph.page_source

    '''
    with requests.Session() as se:
        text = se.get(player_url).text
    '''

    soup = BeautifulSoup(text, 'lxml')
    try:
        wrapper = soup.find('div', id='all_advanced_pbp')
        table = wrapper.find('div', class_='table_outer_container')
        for td in table.find_all('td'):
            player_pbp_data.append(td.get_text())
    except Exception as e:
        print("This page dose not contain pbp")



get_each_player()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM