简体   繁体   中英

Python/Selenium: "This site can't be reached" caused "unknown error: cannot determine loading status"

I'm running a script that works well to scrape some data I need. The script crawl some existing URLs on a given web page and visit each URL to get the final URL. The problem occurs when the final URL is not found " This site can't be reached ". The code crashes and I get this in the log:

    selenium.common.exceptions.WebDriverException: Message: unknown error: session deleted because of page crash
    from unknown error: cannot determine loading status
    from tab crashed
    (Session info: chrome=84.0.4147.135)
    (Driver info: chromedriver=2.43.600210 (68dcf5eebde37173d4027fa8635e332711d2874a),platform=Windows NT 6.1.7601 SP1 x86_64)

Here is the code I use to scrape the final URLs:

    #Open link (opens in new tab)
    elem = driver.find_element_by_xpath('//*[@id="popup__teaser"]/div[6]/div/div/a')
    elem.click()
    time.sleep(2)

    #wait for redirection to load - switch to the new tab - grab and print the new URL
    driver.get(driver.current_url)
    time.sleep(1)
    driver.switch_to_window(driver.window_handles[1])
    URL= driver.current_url

    #Close active tab 
    driver.close()

    #switch to main tab
    driver.switch_to_window(driver.window_handles[0])

Can anybody help with this issue? It only happens when the redirection URL is not found. Thanks

EDIT: I've tried adding chrome_options.add_argument('--disable-dev-shm-usage') but it didn't work.

EDIT2: Here is the URL causing the crash

Try importing requests and check the status code of the site. For a site to be active, it should normally have a status code of 200. If it does not have a status code of 200 then chances are it cannot be reached

import requests

if requests.get(url).status_code!=200:

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM