[英]Python selenium PhantomJS proxy
这是我的代码:
from selenium import webdriver
proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
for s in range (len(proxylist)):
service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
for s in weblist:
driver.get(s)
这个想法是浏览器首先将使用proxylist [0]转到那些站点。 如果proxylist [0]在网站[2]上超时,则proxylist [1]将继续对网站[3]进行处理。 我认为我应该使用try和,但不知道将它们放在哪里。 很高兴您提供了帮助!
尝试这样的事情。 基本上,我们要切换内部和外部循环并添加try / except
for s in weblist:
for s in range (len(proxylist)):
try
service_args = ['--proxy=%s'%(proxylist[s]),'--proxy-type=socks5']
driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
driver.get(s)
break
except TimeOutException:
print 'timed out'
尝试超时的尝试是这样的:
try:
driver.set_page_load_timeout(1)
driver.get("http://www.example.com")
except TimeoutException as ex:
print("Exception has been thrown. " + str(ex))
对于您的代码,添加它就像:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
def test():
temp_count_proxy = 0
driver_opened = 0
for url in weblist:
if temp_count_proxy > len(proxylist):
print("Out of proxy")
return
if driver_opened == 0:
service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
driver_opened = 1
try:
driver.set_page_load_timeout(2)
driver.get(url)
except TimeoutException as ex:
driver.close()
driver_opened = 0
temp_count_proxy += 1
continue
test()
请注意,好像无法获取一个URL一样,它将更改代理,并获取下一个URL(根据您的要求),但不会获取相同的URL。
如果您希望它在重试当前URL失败后更改代理,请使用以下命令:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
def test():
temp_count_proxy = 0
driver_opened = 0
for url in weblist:
while True:
if temp_count_proxy > len(proxylist):
print("Out of proxy")
return
if driver_opened == 0:
service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
driver_opened = 1
try:
driver.set_page_load_timeout(2)
driver.get(url)
# Your code to process here
except TimeoutException as ex:
driver.close()
driver_opened = 0
temp_count_proxy += 1
continue
break
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.