简体   繁体   中英

Rendering Page in JavaScript using Python Selenium

I am using Python Splinter Selenium (Chromedriver) to web-scrape a page. The page has a table created with JavaScript, but I am when Beautiful Soup parses it, the table isn't there. Am having trouble rendering the table so I can parse it with Beautiful Soup. How do it do it within Selenium? If I can't, what libraries should I be using?

Here is an example of what I have:

import pandas as pd
from bs4 import BeautifulSoup as bs
import pymongo
import requests
from splinter import Browser
from datetime import date
from flask_pymongo import PyMongo
import datetime
executable_path = {'executable_path': '/usr/local/bin/chromedriver'}
browser = Browser('chrome', **executable_path, headless=True)
url = "https://www.onthesnow.com/epic-pass/skireport.html"
browser.visit(url)
browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
html = browser.html
soup = bs(html, 'html.parser')
response = requests.get(url)
soup = bs(response.text, 'html.parser')

The link to the website with the table: https://www.onthesnow.com/epic-pass/skireport.html

Thanks in advance for any help.

Use WebDriver.page_source : https://selenium-python.readthedocs.io/api.html#selenium.webdriver.remote.webdriver.WebDriver.page_source

Then pass it to bs4 to parse.

from bs4 import BeautifulSoup
from selenium import webdriver
browser = webdriver.Chrome()
browser.get(url)
html = browser.page_source
soup = BeautifulSoup(html, "lxml")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM