I am trying to get the screenshot of the below URLs through the use of selenium but when I run this code it runs very very very slow.
The most amazing thing is it sometimes runs normal but most of the times it runs very slow. so I need a help.
I just print the screenshots and URL into the HTML file. So don't Confuse.
waybackurls401 = {}
waybackurls403 = {}
webarchive_urls403 = []
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx.com/')
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx2.com/')
print "\t[~]Findind of 403 staruscode urls\n"
GEckodriver = 'F:/geckodriver.exe'
firefox_options = Options()
firefox_options .add_argument("-headless")
driver = webdriver.Firefox(executable_path=GEckodriver, firefox_options = firefox_options )
for x in webarchive_urls403:
try:
print "\t", x
driver.get(x)
driver.set_page_load_timeout(6)
imgfilename = x.split('web')[-1]
newfile= imgfilename.replace('/', '.') +'.png'
driver.get_screenshot_as_file(newfile)
value = "<td><img src= file:///F:/master/{0} + width='20%' height= '25%'></td>".format(newfile, x)
key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
waybackurls403[key] = value
except TimeoutException as ex:
print "Can't take screenshot because. Timeout."
driver.quit()
EDIT:-
According to the Kiril comment, I made some change to see where it actually stops.
for x in webarchive_urls403:
print time.time()-start
try:
print "\t", x
print 'test122'
driver.get(x)
print 'test1'
driver.set_page_load_timeout(10)
imgfilename = x.split('web')[-1]
newfile= imgfilename.replace('/', '.') +'.png'
driver.get_screenshot_as_file(newfile)
print 'test2'
value = "<td><img src= file:///F:/AutoRecon-master/{0} + width='20%' height= '25%'></td>".format(newfile, x)
key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/AutoRecon-master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
waybackurls403[key] = value
print 'test3'
except TimeoutException as ex:
print ex
driver.quit()
Now as you can see I provide some random prints for ex. print test122
to see where it actually stuck.
And I found that I can print test122
but not print test1
after the driver.get()
set it means the code is stuck after the driver.get()
Now that's the whole problem.
Seems to me, like your site just takes long to load. You could try following:
driver.get()
), and carry on, even if the page isn't loaded yet fully.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.