I'm trying to scrape this page to get generation data to pass to a parser later on.
My problem is that the table is populated by multiple scripts that make requests to another server. Beautiful Soup scrapes the page but returns the javascript unexecuted. So I'm trying to use selenium to open the page in a browser then scrape the populated table.
When I run my code Firefox loads the page then closes, but BS still returns the page without the table being populated. I've tried inspecting the page using web console once fully loaded and I can see the data I need ie one data point is contained in a div tag with class = "r11". A search for this tag returns None.
My thoughts are that either a) I'm using selenium wrong or b) the page's formatting is throwing things off since it looks to be quite deeply nested with serveral "sub documents" (not sure of correct term).
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
arg_therm = ('http://portalweb.cammesa.com/MEMNet1/Pages/Informes%20por%20'
'Categor%C3%ADa/Operativos/VisorReporteSinComDesp_minimal.asp'
'x?hora=0&titulo=Despacho%20Generacion%20Termica&reportPath='
'http://lauzet:5000/MemNet1/ReportingServices/Despacho'
'GeneracionTermica.rdl--0--Despacho+Generaci%c3%b3n+T%c3%a9rmica')
browser = webdriver.Firefox()
browser.get(arg_therm)
html_source = browser.page_source
browser.quit()
soup=BeautifulSoup(html_source,'lxml')
print(soup.prettify())
print(soup.find('div', {"class": "r11"}))
Try to use below code to get required table:
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
browser = webdriver.Firefox()
browser.get(arg_therm)
browser.switch_to.frame(browser.find_element_by_xpath('//iframe[starts-with(@name, "RportFramectl00")]'))
browser.switch_to.frame('report')
table_cells = wait(browser, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "r11")))
for cell in table_cells:
print(cell.text)
this should wait for appearance of required elements and return you list of those DIVs
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.