简体   繁体   中英

Unable to access span class while web-scraping using Beautifulsoup

I am trying to extract the number of players data from this site - https://boardgamegeek.com/boardgame/174430/gloomhaven/stats .

from bs4 import BeautifulSoup as bs
import requests
url2 = "https://boardgamegeek.com/boardgame/174430/gloomhaven"
page3 = requests.get(url2)
s2 = bs(page3.content,"html.parser")
var2 = s2.find_all('span',{'class':'ng-scope ng-isolate-scope'})

When I try to use this code, it always returns an empty list at var2. I even tried to access the 'div' class that the 'span' is a part of, but I still get an empty list. Why is this?

Thanks in advance.

The url is loaded dynamically by javascript. If you make disabled javascript from your browser then you will notice that the content from the url goes disappeared that's why you are getting an empty list at var2 because BeautifulSoup can't gab data so you need an automation tool something like selenium. Here I use selenium with BeautifulSoup.

As 'class':'ng-scope ng-isolate-scope' selects only one element so you need to call find method.

Script

from bs4 import BeautifulSoup
import time
from selenium import webdriver

driver = webdriver.Chrome('chromedriver.exe')
driver.maximize_window()
time.sleep(8)

url = 'https://boardgamegeek.com/boardgame/174430/gloomhaven/stats'
driver.get(url)
time.sleep(5)

soup = BeautifulSoup(driver.page_source, 'lxml')
var2 = soup.find('span',{'class':'ng-scope ng-isolate-scope'}).text
print(var2)

Output

1–4

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM