简体   繁体   English

Urllib2在python中使用Tor

[英]Urllib2 using tor in python

I'm trying to crawl websites using a crawler written in Python, and want to integrate Tor with Python meaning I want to crawl the site anonymously using Tor. 我正在尝试使用Python编写的搜寻器来搜寻网站,并希望将Tor与Python集成在一起,这意味着我想使用Tor来匿名地搜寻网站。

I have found some answers on stackoverflow, but none of they work for me. 我在stackoverflow上找到了一些答案,但是它们都不适合我。


Here is the first solution I found from Urllib2 using Tor and socks in python 这是我在Python中使用Tor和袜子Urllib2中找到的第一个解决方案

import socks
import socket
import urllib2    
socks.setdefaultproxy(socks.PROXY_TYPE_HTTP, "127.0.0.1", 9050)
socket.socket = socks.socksocket
print urllib2.urlopen('http://my-ip.herokuapp.com').read()

but I get below error 但我得到以下错误

(501, 'Tor is not an HTTP Proxy')

then, the accepted answer from How can I use a SOCKS 4/5 proxy with urllib2? 然后, 如何将SOCKS 4/5代理与urllib2一起使用?

import socks
import socket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 8080)
socket.socket = socks.socksocket
import urllib2
print urllib2.urlopen('http://www.google.com').read()

I get below error 我得到以下错误

<urlopen error [Errno 111] Connection refused>

then, the top voted answer from Python urllib over TOR? 那么, Python urllib对TOR的投票最高

import socks
import socket
def create_connection(address, timeout=None, source_address=None):
    sock = socks.socksocket()
    sock.connect(address)
    return sock

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 9050)

# patch the socket module
socket.socket = socks.socksocket
socket.create_connection = create_connection

import urllib2

My test url is " http://almien.co.uk/m/tools/net/ip/ ", above code will run for 2 minutes, and end with below error 我的测试网址是“ http://almien.co.uk/m/tools/net/ip/ ”,以上代码将运行2分钟,并以以下错误结束

  File "/usr/lib/python2.7/dist-packages/socks.py", line 369, in connect
    self.__negotiatesocks5(destpair[0],destpair[1])
  File "/usr/lib/python2.7/dist-packages/socks.py", line 236, in __negotiatesocks5
    raise Socks5Error(ord(resp[1]),_generalerrors[ord(resp[1])])
IndexError: tuple index out of range

someone commented that the latest port is 9150 but 9050 , so I tried with 9150 again, and get below error 有人评论说最新的端口是91509050 ,所以我再次尝试9150 ,并得到以下错误

urllib2.URLError: <urlopen error [Errno 111] Connection refused>

UPDATE UPDATE

Add tor information on my machine. 在我的机器上添加tor信息。

root@xxxxxxx:~# tor
Apr 22 14:14:39.818 [notice] Tor v0.2.4.20 (git-0d50b03673670de6) running on Linux with Libevent 2.0.21-stable and OpenSSL 1.0.1f.
Apr 22 14:14:39.818 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://www.torproject.org/download/download#warning
Apr 22 14:14:39.818 [notice] Read configuration file "/etc/tor/torrc".
Apr 22 14:14:39.820 [notice] Opening Socks listener on 127.0.0.1:9050
Apr 22 14:14:39.000 [notice] Parsing GEOIP IPv4 file /usr/share/tor/geoip.
Apr 22 14:14:39.000 [notice] Parsing GEOIP IPv6 file /usr/share/tor/geoip6.
Apr 22 14:14:39.000 [warn] You are running Tor as root. You don't need to, and you probably shouldn't.
Apr 22 14:14:39.000 [warn] OpenSSL version from headers does not match the version we're running with. If you get weird crashes, that might be why. (Compiled with 1000105f: OpenSSL 1.0.1e 11 Feb 2013; running with 1000106f: OpenSSL 1.0.1f 6 Jan 2014).
Apr 22 14:14:40.000 [notice] Bootstrapped 5%: Connecting to directory server.

Start tor then: 然后启动tor:

import socket
import urllib

import socks  # SocksiPy module
import stem.process

SOCKS_PORT = 9050

# Set socks proxy and wrap the urllib module

socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, '127.0.0.1', SOCKS_PORT)
socket.socket = socks.socksocket

# Perform DNS resolution through the socket

def getaddrinfo(*args):
  return [(socket.AF_INET, socket.SOCK_STREAM, 6, '', (args[0], args[1]))]

socket.getaddrinfo = getaddrinfo

print urllib.urlopen('http://my-ip.herokuapp.com').read()

Based on the to_russia_with_love code using stem. 基于使用词干的to_russia_with_love代码。 If you want to also start tor from python you should check out stem . 如果您还想从python启动tor,则应检查一下stem

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM