簡體   English   中英

Python硒PhantomJS代理

[英]Python selenium PhantomJS proxy

這是我的代碼:

from selenium import webdriver

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
for s in range (len(proxylist)):
    service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
    driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
    for s in weblist:
        driver.get(s)

這個想法是瀏覽器首先將使用proxylist [0]轉到那些站點。 如果proxylist [0]在網站[2]上超時,則proxylist [1]將繼續對網站[3]進行處理。 我認為我應該使用try和,但不知道將它們放在哪里。 很高興您提供了幫助!

嘗試這樣的事情。 基本上,我們要切換內部和外部循環並添加try / except

for s in weblist:
    for s in range (len(proxylist)):
        try

            service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
            driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
            driver.get(s)
            break
        except TimeOutException:
            print 'timed out'

嘗試超時的嘗試是這樣的:

try:
    driver.set_page_load_timeout(1)
    driver.get("http://www.example.com")
except TimeoutException as ex:
    print("Exception has been thrown. " + str(ex))

對於您的代碼,添加它就像:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']


def test():
    temp_count_proxy = 0
    driver_opened = 0
    for url in weblist:
        if temp_count_proxy > len(proxylist):
            print("Out of proxy")
            return

        if driver_opened == 0:
            service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
            driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
            driver_opened = 1

        try:
            driver.set_page_load_timeout(2)
            driver.get(url)
        except TimeoutException as ex:
            driver.close()
            driver_opened = 0
            temp_count_proxy += 1
            continue

test()

請注意,好像無法獲取一個URL一樣,它將更改代理,並獲取下一個URL(根據您的要求),但不會獲取相同的URL。

如果您希望它在重試當前URL失敗后更改代理,請使用以下命令:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException

proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']


def test():
    temp_count_proxy = 0
    driver_opened = 0
    for url in weblist:
        while True:

            if temp_count_proxy > len(proxylist):
                print("Out of proxy")
                return

            if driver_opened == 0:
                service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
                driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
                driver_opened = 1

            try:
                driver.set_page_load_timeout(2)
                driver.get(url)
                # Your code to process here

            except TimeoutException as ex:
                driver.close()
                driver_opened = 0
                temp_count_proxy += 1
                continue

            break

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM