python - continue the loop if this iteration takes too long time

Question

I have a problem with PhantomJS, which can hang in a loop without reporting any error. I know my code is good because after restarting it normally completes and maybe hangs somewhere later. What I have in mind is maybe something like this:

i = 0
while i < len(url_list):
    try:
        driver.get(url_list[i])
        # do whatever needs to be done
        i = i+1
        # go on the next one
    except ThisIterationTakesTooLong:
        # try again for this one because the code is definitely good
        continue

Is it even possible to do something like this? Basically, it's a thing in the background that checks how long the loop is running. I know about time.time(), but the problem with that is it won't even measure if it hangs on a command before the counter.

EDIT
After looking at the suggested question, I still have the problem because that signal module doesn't work as it should.

import signal
signal.alarm(5)

This throws "AttributeError: 'module' object has no attribute 'alarm'"
So it looks like I can't really use this.

Answer 1

I've run into this kind of thing before and, unfortunately, there's no pretty way around it. The fact is, sometimes pages/elements just won't load, and you have to make a choice about it. I usually end up doing something like this:

from selenium.common.exceptions import TimeoutException

# How long to wait for page before timeout
driver.set_page_load_timeout(10)

def wait_for_url(driver, url, max_attempts):
    """Make multiple attempts to load page
    according to page load timeout, and
    max_attempts."""

    attempts = 0

    while attempts < max_attempts:

        try:
            driver.get(url)
            return True

        except TimeoutException:
            # Prepare for another attempt
            attempts += 1

            if attempts == 10:
                # Bail on max_attempts
                return False

# We'll use this if we find any urls that won't load
# so we can process later. 
revisit = []

for url in url_list:

    # Make 10 attempts before giving up.
    url_is_loaded = wait_for_url(driver, url, 10)

    if url_is_loaded:
        # Do whatever

    else:
        revisit.append(url)

# Now we can try to process those unvisitied URLs.

I would also add that the issue might be with PhantomJS. The most recent versions of selenium deprecate it. In my experience, PhantomJS is sluggish and prone to unexpected behavior. If you need headless, you can go with the very stable Chrome. If you're not familiar, that looks like:

from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")

driver = webdriver.Chrome(path/to/chromedriver, chrome_options=chrome_options)

Maybe one of those suggestions will help.

python - continue the loop if this iteration takes too long time

Question

1 answers

solution1
1 2018-06-12 15:23:31

python - continue the loop if this iteration takes too long time

Question

1 answers

solution1 1 2018-06-12 15:23:31

solution1
1 2018-06-12 15:23:31