使用 BeautifulSoup 和 Selenium 抓取 Jquery 表

Question

我正在嘗試從該站點http://mananciais.sabesp.com.br/HistoricoSistemas?SistemaId=0每天更新的站點中的表中獲取數據。

我正在學習 BeautifulSoup 和 Selenium 並嘗試使用這些包訪問數據。

但是，該網站的源代碼並未透露表中的數據。 據我了解，該網站是使用 jQuery Grid 構建的。

直到現在我只做了這個，盡管嘗試了幾件事。

import os
import time
from selenium import webdriver

# Escolhe o driver
driver = webdriver.Firefox()

# Acessa o site
site = 'http://mananciais.sabesp.com.br/HistoricoSistemas'
driver.get(site)

soup = BeautifulSoup(driver.page_source, 'html.parser')

print(soup.prettify())

如何訪問這些數據？ 我想用這個做分析

Answer 1

數據由Java Scripts 呈現。您需要等待頁面正確加載才能獲取page_source。

誘導WebDriverWait ()並等待visibility_of_element_located ()

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
# Escolhe o driver

driver = webdriver.Firefox()
# Acessa o site
site = 'http://mananciais.sabesp.com.br/HistoricoSistemas'
driver.get(site)
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.ID,"contenttabledivjqxGrid")))
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(soup.prettify())

希望這可以幫助。

使用 BeautifulSoup 和 Selenium 抓取 Jquery 表

問題描述

1 個解決方案

解決方案1
1 2020-03-24 16:13:02

使用 BeautifulSoup 和 Selenium 抓取 Jquery 表

問題描述

1 個解決方案

解決方案1 1 2020-03-24 16:13:02

解決方案1
1 2020-03-24 16:13:02