This code of python is working very slow?

Question

I am trying to get the screenshot of the below URLs through the use of selenium but when I run this code it runs very very very slow.

The most amazing thing is it sometimes runs normal but most of the times it runs very slow. so I need a help.

I just print the screenshots and URL into the HTML file. So don't Confuse.

waybackurls401 = {}
waybackurls403 = {}

webarchive_urls403 = []
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx.com/')
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx2.com/')  
print "\t[~]Findind of 403 staruscode urls\n"   

GEckodriver = 'F:/geckodriver.exe' 

firefox_options = Options()  
firefox_options .add_argument("-headless")  
driver = webdriver.Firefox(executable_path=GEckodriver, firefox_options = firefox_options ) 

for x in webarchive_urls403:
    
    try:
    
        print "\t", x
        driver.get(x)
        driver.set_page_load_timeout(6)
        imgfilename = x.split('web')[-1]
        newfile= imgfilename.replace('/', '.') +'.png'
        driver.get_screenshot_as_file(newfile)
        value = "<td><img src= file:///F:/master/{0} +  width='20%' height= '25%'></td>".format(newfile, x)
        key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
        waybackurls403[key] = value
        
    except TimeoutException as ex:  
        print "Can't take screenshot because. Timeout." 
driver.quit()

EDIT:-

According to the Kiril comment, I made some change to see where it actually stops.

for x in webarchive_urls403:
    print time.time()-start
    try:
    
        print "\t", x
        print 'test122'
        driver.get(x)
        print 'test1'
        driver.set_page_load_timeout(10)
        
        imgfilename = x.split('web')[-1]
        newfile= imgfilename.replace('/', '.') +'.png'
        driver.get_screenshot_as_file(newfile)
        print 'test2'
        value = "<td><img src= file:///F:/AutoRecon-master/{0} +  width='20%' height= '25%'></td>".format(newfile, x)
        key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/AutoRecon-master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
        waybackurls403[key] = value
        print 'test3'
    except TimeoutException as ex:  
        print ex
    
    

driver.quit()

Now as you can see I provide some random prints for ex. print test122 to see where it actually stuck.

And I found that I can print test122 but not print test1 after the driver.get() set it means the code is stuck after the driver.get()

Now that's the whole problem.

Answer 1

Seems to me, like your site just takes long to load. You could try following:

Check if you have a bad connection//use LAN instead of Wlan
Load your page threaded ( at driver.get() ), and carry on, even if the page isn't loaded yet fully.
in case you use e free or cheap proxy, consider buying a faster one.

This code of python is working very slow?

Question

1 answers

solution1
0 2023-01-30 10:18:39

This code of python is working very slow?

Question

1 answers

solution1 0 2023-01-30 10:18:39

solution1
0 2023-01-30 10:18:39