简体   繁体   English

Python硒PhantomJS代理

[英]Python selenium PhantomJS proxy

This is my code: 这是我的代码:

from selenium import webdriver

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
for s in range (len(proxylist)):
    service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
    driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
    for s in weblist:
        driver.get(s)

The idea is the browser first will use proxylist[0] to go to those sites. 这个想法是浏览器首先将使用proxylist [0]转到那些站点。 If proxylist[0] is timeout at website[2] then proxylist[1] will continue to do the job with website[3]. 如果proxylist [0]在网站[2]上超时,则proxylist [1]将继续对网站[3]进行处理。 I think i should use try and except but a don't know where to put them. 我认为我应该使用try和,但不知道将它们放在哪里。 Glad you helped! 很高兴您提供了帮助!

Try something like this. 尝试这样的事情。 Basicaly we are switching the inner and the outer loops and adding a try/except 基本上,我们要切换内部和外部循环并添加try / except

for s in weblist:
    for s in range (len(proxylist)):
        try

            service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
            driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
            driver.get(s)
            break
        except TimeOutException:
            print 'timed out'

The try catch for time out was something like: 尝试超时的尝试是这样的:

try:
    driver.set_page_load_timeout(1)
    driver.get("http://www.example.com")
except TimeoutException as ex:
    print("Exception has been thrown. " + str(ex))

For your code, adding it would be something like: 对于您的代码,添加它就像:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']


def test():
    temp_count_proxy = 0
    driver_opened = 0
    for url in weblist:
        if temp_count_proxy > len(proxylist):
            print("Out of proxy")
            return

        if driver_opened == 0:
            service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
            driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
            driver_opened = 1

        try:
            driver.set_page_load_timeout(2)
            driver.get(url)
        except TimeoutException as ex:
            driver.close()
            driver_opened = 0
            temp_count_proxy += 1
            continue

test()

Just becareful, as if it fail to get one url, it will change proxy, and get the next url (as you requested) but not get the same url. 请注意,好像无法获取一个URL一样,它将更改代理,并获取下一个URL(根据您的要求),但不会获取相同的URL。

if you want it to change proxy when fail the retry with the current url , use following: 如果您希望它在重试当前URL失败后更改代理,请使用以下命令:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']


def test():
    temp_count_proxy = 0
    driver_opened = 0
    for url in weblist:
        while True:

            if temp_count_proxy > len(proxylist):
                print("Out of proxy")
                return

            if driver_opened == 0:
                service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
                driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
                driver_opened = 1

            try:
                driver.set_page_load_timeout(2)
                driver.get(url)
                # Your code to process here

            except TimeoutException as ex:
                driver.close()
                driver_opened = 0
                temp_count_proxy += 1
                continue

            break

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM