简体   繁体   中英

Using bs4 and requests (or selenium) in python is it possible to get the information that is added after page load (most likely by js)?

I am working on a project in python and it is my first time using bs4 and requests. I have a page that loads, but all the information is added after by js. Using bs4 and requests I can't seems to get the data added by js, how would I do it?

import requests
from bs4 import BeautifulSoup

page = "https://covid-19-newfoundland-and-labrador-gnl.hub.arcgis.com"
result = requests.get(page)
source = result.text

soup = BeautifulSoup(source, 'html.parser')

if soup.head.parent.name == 'html':
    print(soup.title)
    tmpBody = soup.body
    # print(soup)
    div1 = soup.find(id="ember63")
    print(soup.find_all('section'))
    print(div1)
else:
    print("not html")

I found some code similar to this on stackoverflow but it says chromedriver executable needs to be on path, and I am not sure what chromedriver is.

from bs4 import BeautifulSoup
from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=chrome_options)

url = "https://covid-19-newfoundland-and-labrador-gnl.hub.arcgis.com"
driver.get(url)
page = driver.page_source

soup = BeautifulSoup(page, 'html.parser')

if soup.head.parent.name == 'html':
    print(soup.title)
    tmpBody = soup.body
    div1 = soup.find(id="ember63")
    print(soup.find_all('section'))
    print(div1)
else:
    print("not html")

Without knowing exact what you want i guess this is near:

from selenium import webdriver

chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(chrome_options=chrome_options)

url = "https://covid-19-newfoundland-and-labrador-gnl.hub.arcgis.com"
driver.get(url)
driver.implicitly_wait(15)

div1 = driver.find_element_by_css_selector('#ember63').text.strip()
print(*[section.text.strip() for section in driver.find_elements_by_css_selector('section')])
print(div1)
driver.close()

prints:

Covid-19 Home Pandemic Update
Total # of Cases
......
New Cases
Last new case: July 26, 2020
Total Recovered
......
Currently Hospitalized
.......
Currently in ICU
......
Total Deaths
...... Active Cases
......
Total # of People Tested
......

And so on..

This also do it al in the same libe without need of BeautifulSoup. Why selenium not working for you is answered in the comments

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM