简体   繁体   中英

Scrape a text that was written by javascript from website

I am using BeautifulSoup to scrape character info from a website. When trying to get the win rate of a character, BeautifulSoup can't find it.

When I inspect the text, it is listed as under . All I can find in the sites source code and all that BeautifulSoup finds is "ranking-stats-placeholder".

This is the code I am currently using.

import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://u.gg/lol/champions/darius/build/?role=top"

#opening up connection, grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

#html parsing
page_soup = soup(page_html, "html.parser")

#champion name
champ_name = page_soup.findAll("span", {"class":"champion-name"})[0].text

#champion win rate
champ_wr = page.soup.findAll("div", {"class":"win-rate okay-tier"})

I believe that the win rate text is added by javascript, but I have no idea how I can get the text. The code I currently have returns "None" for champ_wr

Although this text technically could be in the javascript itself, my first guess is the JS is pulling it in via an ajax request. Have your program simulate that and you'll probably get all the data you need handed right to you without any scraping involved!

It will take a little detective work though. I suggest turning on your network traffic logger (such as "Web Developer Toolbar" in Firefox) and then visiting the site. Focus your attention attention on any/all XmlHTTPRequests.

Best of luck!

not sure how tied to BeautifulSoup you are, but I can get selenium doing useful things with:

# load code from selenium package
from selenium.webdriver import Remote
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

# start an instance of Chrome up
chrome = Service('/usr/local/bin/chromedriver')
chrome.start()
driver = Remote(chrome.service_url)

# get the page loading
driver.get("https://u.gg/lol/champions/darius/build/?role=top")

# wait for the win rate to be populated
WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "win-rate")))

# get the values you wanted
name = driver.find_element_by_class_name("champion-name").text
winrate = driver.find_element_by_class_name("win-rate").text

# display them
print(f"name: {repr(name)}, winrate: {winrate.split()[0]}")

# clean up a bit
driver.quit()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM