Using bs4 and requests (or selenium) in python is it possible to get the information that is added after page load (most likely by js)?

Question

I am working on a project in python and it is my first time using bs4 and requests. I have a page that loads, but all the information is added after by js. Using bs4 and requests I can't seems to get the data added by js, how would I do it?

import requests
from bs4 import BeautifulSoup

page = "https://covid-19-newfoundland-and-labrador-gnl.hub.arcgis.com"
result = requests.get(page)
source = result.text

soup = BeautifulSoup(source, 'html.parser')

if soup.head.parent.name == 'html':
    print(soup.title)
    tmpBody = soup.body
    # print(soup)
    div1 = soup.find(id="ember63")
    print(soup.find_all('section'))
    print(div1)
else:
    print("not html")

I found some code similar to this on stackoverflow but it says chromedriver executable needs to be on path, and I am not sure what chromedriver is.

from bs4 import BeautifulSoup
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=chrome_options)

url = "https://covid-19-newfoundland-and-labrador-gnl.hub.arcgis.com"
driver.get(url)
page = driver.page_source

soup = BeautifulSoup(page, 'html.parser')

if soup.head.parent.name == 'html':
    print(soup.title)
    tmpBody = soup.body
    div1 = soup.find(id="ember63")
    print(soup.find_all('section'))
    print(div1)
else:
    print("not html")

Answer 1

Without knowing exact what you want i guess this is near:

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=chrome_options)

url = "https://covid-19-newfoundland-and-labrador-gnl.hub.arcgis.com"
driver.get(url)
driver.implicitly_wait(15)

div1 = driver.find_element_by_css_selector('#ember63').text.strip()
print(*[section.text.strip() for section in driver.find_elements_by_css_selector('section')])
print(div1)
driver.close()

prints:

Covid-19 Home Pandemic Update
Total # of Cases
......
New Cases
Last new case: July 26, 2020
Total Recovered
......
Currently Hospitalized
.......
Currently in ICU
......
Total Deaths
...... Active Cases
......
Total # of People Tested
......

And so on..

This also do it al in the same libe without need of BeautifulSoup. Why selenium not working for you is answered in the comments

Using bs4 and requests (or selenium) in python is it possible to get the information that is added after page load (most likely by js)?

Question

1 answers

solution1
0 ACCPTED 2020-07-27 16:25:20

Using bs4 and requests (or selenium) in python is it possible to get the information that is added after page load (most likely by js)?

Question

1 answers

solution1 0 ACCPTED 2020-07-27 16:25:20

solution1
0 ACCPTED 2020-07-27 16:25:20