简体   繁体   中英

BeautifulSoup ESPN: Scraping Sports Score but .findAll gives an empty ResultSet. How to pull proper info?

Beginning Python and BeautifulSoup user here.

I'm trying to scrape some sports score from ESPN website but the returns are empty.

Sample Target: ESPN Website > NBA > Scores

I want to get some info such as Team Name, Score, Record, and Quarter/Final but since I'm having trouble I'll just start with Score. I would like to get the total score of the teams.

from bs4 import BeautifulSoup as bs
from urllib.request import urlopen as uReq

html_url = 'http://www.espn.co.uk/nba/scoreboard'

uClient = uReq(html_url)

page_html = uClient.read()

uClient.close()

page_soup = bs(page_html, 'html.parser')

containers = page_soup.findAll('td',{"class":"total"})

print (len(containers))
print (type(containers))

Output

0
<class 'bs4.element.ResultSet'>

I spent the whole day trying to figure out why all my results keep coming back NoneType and empty I can't seem to figure it out.

I tried just looking for 'td' and this is the result

containers = page_soup.findAll('td')

print (len(containers))
print (type(containers))

Output

0
<class 'bs4.element.ResultSet'>

Not sure why I'm unable to pull the data. Is there something going on behind the scenes that ESPN is purposely not allowing us to scrape or something? I have tried looking through different tags, attributes, etc but can't figure it out. Thank you

I believe the problem you're encountering is due to the web content being dynamically displayed through Javascript. The way you're going about it won't let you access that information, but you might want to look at this post on using Selenium and BeautifulSoup together to parse dynamic web content. Try running the code below to get the scores you were searching for there:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.Firefox()
driver.get("http://www.espn.co.uk/nba/scoreboard")

html = driver.page_source
soup = BeautifulSoup(html, "lxml")

for tag in soup.find_all("td", {"class":"total"}):
    print (tag.text)

This produces the following output:

87
99
106
102
123
131

You may need to look at this post to download Selenium and add it to your system PATH in order for the script to work.

EDIT: Updated to specify the lxml HTML parser recommended by the BeautifulSoup documentation for its speed.

The data you are trying to get is rendered due to the JavaScript running in your browser. I recommend you RequestsHTML .

Code:

from requests_html import HTMLSession

session = HTMLSession()
r = session.get('http://www.espn.co.uk/nba/scoreboard')
r.html.render()

for tag in r.html.find('td.total'):
    print(tag.text)

Output:

106
102
123
131
105
121
102
115

Don't forget to install it with: pip install requests-html . Have fun! :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM