简体   繁体   English

使用多个代理在urllib2中打开链接

[英]Using multiple proxies to open a link in urllib2

What i am trying to do is read a line(an ip address), open the website with that address, and then repeat with all the addresses in the file. 我想做的是读取一行(一个IP地址),用该地址打开网站,然后重复文件中的所有地址。 instead, i get an error. 相反,我得到一个错误。 I am new to python, so maybe its a simple mistake. 我是python的新手,所以也许是一个简单的错误。 Thanks in advance !!! 提前致谢 !!!

CODE: 码:

>>> f = open("proxy.txt","r");          #file containing list of ip addresses
>>> address = (f.readline()).strip();      # to remove \n at end of line
>>> 
>>> while line:
        proxy = urllib2.ProxyHandler({'http': address })
        opener = urllib2.build_opener(proxy)
        urllib2.install_opener(opener)
        urllib2.urlopen('http://www.google.com')
        address = (f.readline()).strip();

ERROR: 错误:

Traceback (most recent call last):
  File "<pyshell#15>", line 5, in <module>
    urllib2.urlopen('http://www.google.com')
  File "D:\Programming\Python\lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "D:\Programming\Python\lib\urllib2.py", line 394, in open
    response = self._open(req, data)
  File "D:\Programming\Python\lib\urllib2.py", line 412, in _open
    '_open', req)
  File "D:\Programming\Python\lib\urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "D:\Programming\Python\lib\urllib2.py", line 1199, in http_open
    return self.do_open(httplib.HTTPConnection, req)
  File "D:\Programming\Python\lib\urllib2.py", line 1174, in do_open
    raise URLError(err)
URLError: <urlopen error [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond>

It means that the proxy is unavailable. 这意味着代理不可用。

Here's a proxy checker that checks several proxies simultaneously: 这是同时检查多个代理的代理检查器:

#!/usr/bin/env python
import fileinput # accept proxies from files or stdin

try:
    from gevent.pool import Pool # $ pip install gevent
    import gevent.monkey; gevent.monkey.patch_all() # patch stdlib
except ImportError: # fallback on using threads
    from multiprocessing.dummy import Pool

try:
    from urllib2 import ProxyHandler, build_opener
except ImportError: # Python 3
    from urllib.request import ProxyHandler, build_opener

def is_proxy_alive(proxy, timeout=5):
    opener = build_opener(ProxyHandler({'http': proxy})) # test redir. and such
    try: # send request, read response headers, close connection
        opener.open("http://example.com", timeout=timeout).close()
    except EnvironmentError:
        return None
    else:
        return proxy

candidate_proxies = (line.strip() for line in fileinput.input())
pool = Pool(20) # use 20 concurrent connections
for proxy in pool.imap_unordered(is_proxy_alive, candidate_proxies):
    if proxy is not None:
       print(proxy)

Usage: 用法:

$ python alive-proxies.py proxy.txt
$ echo user:password@ip:port | python alive-proxies.py

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM