简体   繁体   中英

python - continue the loop if this iteration takes too long time

I have a problem with PhantomJS, which can hang in a loop without reporting any error. I know my code is good because after restarting it normally completes and maybe hangs somewhere later. What I have in mind is maybe something like this:

i = 0
while i < len(url_list):
    try:
        driver.get(url_list[i])
        # do whatever needs to be done
        i = i+1
        # go on the next one
    except ThisIterationTakesTooLong:
        # try again for this one because the code is definitely good
        continue

Is it even possible to do something like this? Basically, it's a thing in the background that checks how long the loop is running. I know about time.time(), but the problem with that is it won't even measure if it hangs on a command before the counter.


EDIT
After looking at the suggested question, I still have the problem because that signal module doesn't work as it should.

import signal
signal.alarm(5)

This throws "AttributeError: 'module' object has no attribute 'alarm'"
So it looks like I can't really use this.

I've run into this kind of thing before and, unfortunately, there's no pretty way around it. The fact is, sometimes pages/elements just won't load, and you have to make a choice about it. I usually end up doing something like this:

from selenium.common.exceptions import TimeoutException

# How long to wait for page before timeout
driver.set_page_load_timeout(10)

def wait_for_url(driver, url, max_attempts):
    """Make multiple attempts to load page
    according to page load timeout, and
    max_attempts."""

    attempts = 0

    while attempts < max_attempts:

        try:
            driver.get(url)
            return True

        except TimeoutException:
            # Prepare for another attempt
            attempts += 1

            if attempts == 10:
                # Bail on max_attempts
                return False

# We'll use this if we find any urls that won't load
# so we can process later. 
revisit = []

for url in url_list:

    # Make 10 attempts before giving up.
    url_is_loaded = wait_for_url(driver, url, 10)

    if url_is_loaded:
        # Do whatever

    else:
        revisit.append(url)

# Now we can try to process those unvisitied URLs. 

I would also add that the issue might be with PhantomJS. The most recent versions of selenium deprecate it. In my experience, PhantomJS is sluggish and prone to unexpected behavior. If you need headless, you can go with the very stable Chrome. If you're not familiar, that looks like:

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(path/to/chromedriver, chrome_options=chrome_options)

Maybe one of those suggestions will help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM