[英]How to use Scrapy with both Splash and Tor over Privoxy in Docker Compose
[英]How to use Scrapy with Python and Tor over Privoxy in Docker Compose
我正在嘗試使用 Python 和 tor 和 privoxy 運行 Scrapy。 我在https://github.com/khpeek/privoxy-tor-scraper中使用了 khpeek/privoxy-tor-scraper 的刮板。這是我的目錄結構:
- docker-compose.yml
- privoxy
- config
- Dockerfile
- scraper
- Dockerfile
- newnym.py
- requirements.txt
- tor
- Dockerfile
我正在嘗試運行以下docker-compose.yml :
version: '3'
services:
privoxy:
build: ./privoxy
ports:
- "8118:8118"
links:
- tor
tor:
build:
context: ./tor
args:
password: "1234"
ports:
- "9050:9050"
- "9051:9051"
scraper:
build: ./scraper
links:
- tor
- privoxy
其中Tor的Dockerfile是:
FROM alpine:3.7
EXPOSE 9050 9051
ARG password
RUN apk --update add tor
RUN echo "ControlPort 9051" >> /etc/tor/torrc
RUN echo "CookieAuthentication 1" >> /etc/tor/torrc
RUN echo "HashedControlPassword $(tor --quiet --hash-password $password)" >> /etc/tor/torrc
CMD ["tor"]
privoxy的帽子是:
FROM alpine:latest
EXPOSE 8118
RUN apk --update add privoxy
COPY config /etc/privoxy/
#CMD ["privoxy", "--no-daemon"]
CMD ["privoxy", "--no-daemon", "/etc/privoxy/config"]
其中config由兩行組成:
listen-address 0.0.0.0:8118
forward-socks5 / tor:9050 .
刮刀的Dockerfile是:
FROM python:3.6-alpine
ADD . /scraper
WORKDIR /scraper
RUN pip install --upgrade pip
RUN pip install -r requirements.txt
CMD ["python", "newnym.py"]
其中requirements.txt包含單行requests 。 最后, newnym.py程序旨在簡單地測試使用 Tor 更改 IP 地址是否有效:
from time import sleep, time
import requests as req
import telnetlib
def get_ip():
IPECHO_ENDPOINT = 'http://ipecho.net/plain'
HTTP_PROXY = 'http://privoxy:8118'
return req.get(IPECHO_ENDPOINT, proxies={'http': HTTP_PROXY}).text
def request_ip_change():
#tn = telnetlib.Telnet('privoxy',8118)
tn = telnetlib.Telnet('tor',9051)
tn.read_until("Escape character is '^]'.", 2)
tn.write('AUTHENTICATE ""\r\n')
tn.read_until("250 OK", 2)
tn.write("signal NEWNYM\r\n")
tn.read_until("250 OK", 2)
if __name__ == '__main__':
dts = []
#isOpen('tor',9051)
#isOpen('privoxy',8118)
try:
while True:
ip = get_ip()
t0 = time()
request_ip_change()
while True:
new_ip = get_ip()
if new_ip == ip:
sleep(1)
else:
break
dt = time() - t0
dts.append(dt)
print("{} -> {} in ~{}s".format(ip, new_ip, int(dt)))
except KeyboardInterrupt:
print("Stopping...")
print("Average: {}".format(sum(dts) / len(dts)))
docker-compose構建成功構建,但如果我嘗試docker-compose up ,我會收到以下錯誤消息:
scraper_1_651fd6690a2d | Traceback (most recent call last):
scraper_1_651fd6690a2d | File "newnym.py", line 45, in <module>
scraper_1_651fd6690a2d | request_ip_change()
scraper_1_651fd6690a2d | File "newnym.py", line 27, in request_ip_change
scraper_1_651fd6690a2d | tn = telnetlib.Telnet('tor',9051)
scraper_1_651fd6690a2d | File "/usr/local/lib/python3.6/telnetlib.py", line 218, in __init__
scraper_1_651fd6690a2d | self.open(host, port, timeout)
scraper_1_651fd6690a2d | File "/usr/local/lib/python3.6/telnetlib.py", line 234, in open
scraper_1_651fd6690a2d | self.sock = socket.create_connection((host, port), timeout)
scraper_1_651fd6690a2d | File "/usr/local/lib/python3.6/socket.py", line 724, in create_connection
scraper_1_651fd6690a2d | raise err
scraper_1_651fd6690a2d | File "/usr/local/lib/python3.6/socket.py", line 713, in create_connection
scraper_1_651fd6690a2d | sock.connect(sa)
scraper_1_651fd6690a2d | ConnectionRefusedError: [Errno 111] Connection refused
我通常在未啟動 tor 時收到此錯誤。 我認為解決方案可能是從 tor 的 dockerfile 更改 CMD:
rc-service tor start
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.