简体   繁体   中英

How do I Scrape new refreshed data after redirecting to the new page by using selenium

I'm Working on a data scraping work by using python and I wanted to do scrape the new redirect page data after clicking on the redirect button.

This is the code which i have tried.

browser =  webdriver.Firefox()
browser.get("https://www.cbsl.gov.lk/en/statistics/economic-indicators")
window_before = browser.window_handles[0]
print(window_before)
browser.find_element_by_xpath('/html/body/div[2]/div[3]/div/div/div/div/div/div/div/div/div/div/div/div/div/div/div/div[4]/div[2]/p[1]/a').click()
window_after = browser.window_handles[1]
browser.switch_to_window(window_after)
print(window_after)

bs_obj = BSoup(browser.page_source,'lxml')
table = bs_obj.find("table", id="statTB")
print(table)

this will redirect to the new page. but after print the table it was not showing anything. I think still it was trying on the old page.

No. When you switched to new window, browser.page_source returns you HTML of new window, but you might need to wait until required table appeared in DOM:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

...
browser.switch_to_window(window_after)
table = WebDriverWait(browser, 10).until(EC.visibility_of_element_located((By.ID, "statTB")))
print(table.text)

you need multiple WebDriverWait , waiting second window and page loaded

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

browser.get("https://www.cbsl.gov.lk/en/statistics/economic-indicators")
window_before = browser.window_handles[0]
print(window_before)
browser.find_element_by_xpath('/html/body/div[2]/div[3]/div/div/div/div/div/div/div/div/div/div/div/div/div/div/div/div[4]/div[2]/p[1]/a').click()
WebDriverWait(browser, 20).until(EC.number_of_windows_to_be(2))
window_after = browser.window_handles[1]
browser.switch_to_window(window_after)
print(window_after)

myElem = WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, 'Grid')))
bs_obj = BeautifulSoup(browser.page_source, 'lxml')

table = bs_obj.find("table", id="statTB")
print(table)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM