简体   繁体   中英

BeautifulSoup can't find class that exists on webpage?

So I am trying to scrape the following webpage https://www.scoreboard.com/uk/football/england/premier-league/ ,

Specifically the scheduled and finished results. Thus I am trying to look for the elements with class = "stage-finished" or "stage-scheduled" . However when I scrape the webpage and print out what page_soup contains, it doesn't contain these elements.

I found another SO question with an answer saying that this is because it is loaded via AJAX and I need to look at the XHR under the network tab on chrome dev tools to find the file thats loading the necessary data, however it doesn't seem to be there?

import bs4
import requests
from bs4 import BeautifulSoup as soup
import csv
import datetime

myurl = "https://www.scoreboard.com/uk/football/england/premier-league/"
headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}
page = requests.get(myurl, headers=headers)

page_soup = soup(page.content, "html.parser")

scheduled = page_soup.select(".stage-scheduled")
finished = page_soup.select(".stage-finished")
live = page_soup.select(".stage-live")
print(page_soup)
print(scheduled[0])

The above code throws an error of course as there is no content in the scheduled array.

My question is, how do I go about getting the data I'm looking for?

I copied the contents of the XHR files to a notepad and searched for stage-finished and other tags and found nothing. Am I missing something easy here?

The page is JavaScript rendered. You need Selenium. Here is some code to start on:

from selenium import webdriver

url = 'https://www.scoreboard.com/uk/football/england/premier-league/'

driver = webdriver.Chrome()
driver.get(url)
stages = driver.find_elements_by_class_name('stage-scheduled')
driver.close()

Or you could pass driver.content in to the BeautifulSoup method. Like this:

soup = BeautifulSoup(driver.page_source, 'html.parser')

Note: You need to install a webdriver first. I installed chromedriver.

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM