简体   繁体   中英

Python selenium PhantomJS proxy

This is my code:

from selenium import webdriver

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
for s in range (len(proxylist)):
    service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
    driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
    for s in weblist:
        driver.get(s)

The idea is the browser first will use proxylist[0] to go to those sites. If proxylist[0] is timeout at website[2] then proxylist[1] will continue to do the job with website[3]. I think i should use try and except but a don't know where to put them. Glad you helped!

Try something like this. Basicaly we are switching the inner and the outer loops and adding a try/except

for s in weblist:
    for s in range (len(proxylist)):
        try

            service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
            driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
            driver.get(s)
            break
        except TimeOutException:
            print 'timed out'

The try catch for time out was something like:

try:
    driver.set_page_load_timeout(1)
    driver.get("http://www.example.com")
except TimeoutException as ex:
    print("Exception has been thrown. " + str(ex))

For your code, adding it would be something like:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']


def test():
    temp_count_proxy = 0
    driver_opened = 0
    for url in weblist:
        if temp_count_proxy > len(proxylist):
            print("Out of proxy")
            return

        if driver_opened == 0:
            service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
            driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
            driver_opened = 1

        try:
            driver.set_page_load_timeout(2)
            driver.get(url)
        except TimeoutException as ex:
            driver.close()
            driver_opened = 0
            temp_count_proxy += 1
            continue

test()

Just becareful, as if it fail to get one url, it will change proxy, and get the next url (as you requested) but not get the same url.

if you want it to change proxy when fail the retry with the current url , use following:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']


def test():
    temp_count_proxy = 0
    driver_opened = 0
    for url in weblist:
        while True:

            if temp_count_proxy > len(proxylist):
                print("Out of proxy")
                return

            if driver_opened == 0:
                service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
                driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
                driver_opened = 1

            try:
                driver.set_page_load_timeout(2)
                driver.get(url)
                # Your code to process here

            except TimeoutException as ex:
                driver.close()
                driver_opened = 0
                temp_count_proxy += 1
                continue

            break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM