with open('./links.txt', 'r') as f:
for line in f:
browser.get(line)
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".jw-media")))
title = None
network = None
subnetwork = None
html = browser.page_source
if isinstance(title, str):
title = title.text
else:
with open('./notfound.txt', 'a') as h:
h.write(line)
h.write('\n')
h.close()
next(line)
For each line in f it is going to set the variables title,.network and sub.network to None and every time a page loads with the url (which is each line) from links.txt it will set the variables to the correct strings. The if statement will check whether the variables have changed and if they haven't I want it to go to the next string and start from the top settings the variables to None, loading the page, etc. is there any way of doing this?
You're looking for continue in https://docs.python.org/3/tutorial/controlflow.html#break-and-continue-statements-and-else-clauses-on-loops this will go on to the next iteration of the loop.
The problem with your code is that the next(line)
statement makes no sense, as line
is not an iterator (the loop construct is already taking care of incrementing that for you). Also, using h.close()
inside the with
block is wrong, as close
will be automatically called when you leave the with
block. Finally, it is not a good practice to open and close the file at every iteration of the loop.
The code below addresses these points:
with open('./links.txt', 'r') as f:
with open('./notfound.txt', 'a') as h:
for line in f:
browser.get(line.strip()) # Remove extraneous spaces, including the final '\n'
WebDriverWait(browser, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, ".jw-media")))
title = browser.title # This is how you get the title of the page
network = None
subnetwork = None
html = browser.page_source
if isinstance(title, str): # Not sure what is your goal with this line
title = title.text
else:
h.write(line) # You don't need to add the '\n' because line already has it
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.