简体   繁体   中英

This code of python is working very slow?

I am trying to get the screenshot of the below URLs through the use of selenium but when I run this code it runs very very very slow.

The most amazing thing is it sometimes runs normal but most of the times it runs very slow. so I need a help.

I just print the screenshots and URL into the HTML file. So don't Confuse.

waybackurls401 = {}
waybackurls403 = {}

webarchive_urls403 = []
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx.com/')
webarchive_urls403.append('https://web.archive.org/web/2012062112352/http://xx2.com/')  
print "\t[~]Findind of 403 staruscode urls\n"   

GEckodriver = 'F:/geckodriver.exe' 

firefox_options = Options()  
firefox_options .add_argument("-headless")  
driver = webdriver.Firefox(executable_path=GEckodriver, firefox_options = firefox_options ) 

for x in webarchive_urls403:
    
    try:
    
        print "\t", x
        driver.get(x)
        driver.set_page_load_timeout(6)
        imgfilename = x.split('web')[-1]
        newfile= imgfilename.replace('/', '.') +'.png'
        driver.get_screenshot_as_file(newfile)
        value = "<td><img src= file:///F:/master/{0} +  width='20%' height= '25%'></td>".format(newfile, x)
        key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
        waybackurls403[key] = value
        
    except TimeoutException as ex:  
        print "Can't take screenshot because. Timeout." 
driver.quit()

    

EDIT:-

According to the Kiril comment, I made some change to see where it actually stops.

for x in webarchive_urls403:
    print time.time()-start
    try:
    
        print "\t", x
        print 'test122'
        driver.get(x)
        print 'test1'
        driver.set_page_load_timeout(10)
        
        imgfilename = x.split('web')[-1]
        newfile= imgfilename.replace('/', '.') +'.png'
        driver.get_screenshot_as_file(newfile)
        print 'test2'
        value = "<td><img src= file:///F:/AutoRecon-master/{0} +  width='20%' height= '25%'></td>".format(newfile, x)
        key = "<tr><td width=\"50%\">{0}</td><td width=\"50%\"><img src= file:///F:/AutoRecon-master/{1} width='30%' height= '20%'><br><a href=\"{2}\">URL</a></td></tr>".format(x, newfile, x)
        waybackurls403[key] = value
        print 'test3'
    except TimeoutException as ex:  
        print ex
    
    

driver.quit()

Now as you can see I provide some random prints for ex. print test122 to see where it actually stuck.

And I found that I can print test122 but not print test1 after the driver.get() set it means the code is stuck after the driver.get()

Now that's the whole problem.

Seems to me, like your site just takes long to load. You could try following:

  • Check if you have a bad connection//use LAN instead of Wlan
  • Load your page threaded ( at driver.get() ), and carry on, even if the page isn't loaded yet fully.
  • in case you use e free or cheap proxy, consider buying a faster one.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM