简体   繁体   中英

Nested for-while loop stops iteration after first run

Platform:
Python version: 3.7.3
Selenium Version: 3.141.0
OS: Win7

Issue:
I have a url list as a text file, with each url on a separate line. The urls are download link. I want to iterate through all the urls and download the files linked to each url, into a specific folder.

The code that I have tried is a nested for-while loop. The first iteration goes through without any issue, but the second iteration gets stuck on one of the while loop.

There obviously is a better way of doing what I am trying to do. I am just a beginner in python and learning the language as best as I can.

My Url List:

https://mega.nz/#!bOgBWKiB!AWs3JSksW0mpZ8Eob0-Qpr5ZAG0N1zhoFBFVstNJfXs
https://mega.nz/#!qPxGAAYJ!BX-hv7jgE4qvBs_uhHPVpsLRm1Yl4JkZ17nI1-U6hvk
https://mega.nz/#!GPoiHaaT!TAKT4sOhIiMUSFFSmlvPOidMcscXzHH_8HgK27LyTRM

Code that I have tried:

import os
from selenium import webdriver
from selenium.webdriver.common.by import By
from time import sleep
from pathlib import Path
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary('C:\\Program Files\\Mozilla Firefox\\firefox.exe')
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", "H:\\downloads")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/zip")
driver = webdriver.Firefox(firefox_binary=binary, firefox_profile=fp, executable_path=r'C:\\Program Files\\Python\\Python37\\Lib\\site-packages\\selenium\\webdriver\\firefox\\geckodriver.exe')
driver.set_window_size(1600, 1050)
with open("H:\\downloads\\my_url_list.txt", "r") as f:
    for url in f:
        driver.get(url.strip())
        sleep(5)
        while True:
            # checks whether the element is available on the page, used 'while' instead of 'wait' as I couuldn't figure out the wait time.
            try:
                content = driver.find_element_by_css_selector('div.buttons-block:nth-child(1) > div:nth-child(2)')
                break
            except NoSuchElementException:
                continue
        # used 'execute_script' instead of 'click()' due to "scroll into view error"
        driver.execute_script("arguments[0].click();", content)
        sleep(5)
        while True:
            # checks whether 'filename' element is available on the page, the page shows multiple elements depending on interaction.
            if driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]"):
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]").text
                break
            elif driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]"):
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]").text
                break
            else:
                sleep(5)
        print(filename)
        dirname = 'H:\\downloads'
        suffix = '.zip'
        file_path = Path(dirname, filename).with_suffix(suffix)
        while True:
            # checks whether the file has downloaded into the folder.
            if os.path.isfile(file_path):
                break

What's happening:

The first iteration goes through - File (linked to the url) gets downloaded into the H:\\\\downloads folder and filename gets printed.

In case of the second iteration, the file gets downloaded into the folder but the filename doesn't get printed, the second while loop involved goes into indefinite loop.

No for iteration after the second run, as the filename can't be retrieved in the second iteration, loop goes into indefinite mode.

Second while loop in the code above:

while True:  
            # checks whether 'filename' element is available on the page, the page shows multiple elements depending on interaction.  
            if driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]"):  
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]").text  
                break  
            elif driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]"):  
                filename = driver.find_element_by_xpath("/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]").text  
                break  
            else:  
                sleep(5) 

Attached images for filename xpath option ( the reason why two different xpath were chosen for filename)

  1. while loop first option

在此处输入图片说明

  1. while loop second option

在此处输入图片说明

What you are searching for is an explicit wait, I advise you to visit this page from Selenium-python documentation. I quote from the page:

An explicit wait is a code you define to wait for a certain condition to occur before proceeding further in the code. The extreme case of this is time.sleep(), which sets the condition to an exact time period to wait. There are some convenience methods provided that help you write code that will wait only as long as required. WebDriverWait in combination with ExpectedCondition is one way this can be accomplished.

If you want to know more about ExpectedCondition you can visit this link of the documentation

I suggest this code for your case, using a lambda function because you are waiting for at least one element.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get("http://somedomain/url_that_delays_loading")
try:
    xpath1="/html/body/div[6]/div[3]/div/div[1]/div[4]/div[1]/div/span[1]"
    xpath2="/html/body/div[6]/div[3]/div/div[1]/div[5]/div/div/div[1]/div[1]/div[2]/div[3]/div[1]/span[1]"
    timeLimit = 15 #seconds, you really need to set a time out.
    element = WebDriverWait(driver, timeLimit).until( lambda driver: driver.find_elements(By.xpath, xpath1) or driver.find_elements(By.xpath, xpath2) )
finally:
    pass

This waits up to 15 seconds before throwing a TimeoutException unless it finds one of the elements you are waiting by xpath. WebDriverWait by default calls the ExpectedCondition every 500 milliseconds until it returns successfully so you don't need to handle the logic and the loops as you are trying to do.

For handling the TimeoutException you can for exemple refresh the page.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM