![](/img/trans.png)
[英]HTTPConnectionPool(host=\'0.0.0.0\', port=7000): Max retries exceeded with url (Caused by NewConnectionError
[英]Scrapyd + Django in Docker: HTTPConnectionPool (host = '0.0.0.0', port = 6800) error
我是一個正在尋求幫助的意大利小男孩。我正在使用 django 和 scrapyd 為我的 web 刮刀構建 web 接口。 這是我第一次使用 scrapy,但由於網上有大量的文檔,我學得很快。 然而,我發現自己在通過 scrapyd_api.ScrapydAPI 啟動我的蜘蛛時遇到了很大的困難。 盡管在正確的端口啟動服務器(curl 和瀏覽器請求都有效),但 django 返回 requests.exceptions.ConnectionError: HTTPConnectionPool (host = '0.0.0.0', port = 6800) 錯誤。
首先,這是我的文件夾結構:
scraper
├── admin.py
├── apps.py
├── dbs
│ └── default.db
├── __init__.py
├── items.py
├── logs
│ └── default
│ └── autoscout
│ ├── 0b2585dc6f2011eba4d30242ac140002.log
│ ├── 1fd803a66f2011eba4d30242ac140002.log
│ └── 6fac4d646f2111eba4d30242ac140002.log
├── middlewares.py
├── migrations
│ ├── 0001_initial.py
│ ├── 0002_auto_20210214_2019.py
│ ├── 0003_auto_search_token.py
│ ├── __init__.py
│ └── __pycache__
│ ├── 0001_initial.cpython-38.pyc
│ ├── [...]
│ └── __init__.cpython-38.pyc
├── models.py
├── pipelines.py
├── __pycache__
│ ├── admin.cpython-38.pyc
│ ├── [...]
│ └── views.cpython-38.pyc
├── serializers.py
├── settings.py
├── spiders
│ ├── AutoScout.py
│ ├── __init__.py
│ └── __pycache__
│ ├── AutoScout.cpython-38.pyc
│ └── __init__.cpython-38.pyc
├── urls.py
└── views.py
和我的 docker 組成:
version: "3.9"
services:
django:
build: .
command: python manage.py runserver 0.0.0.0:8000
volumes:
- .:/app
ports:
- "8000:8000"
scrapyd:
build: .
command: bash -c "cd /app/scraper && scrapyd"
volumes:
- .:/app
ports:
- "6800:6800"
tty: true
stdin_open: true
dns:
- 8.8.8.8
在這里,我們 go 嘗試通過 scrapyd 運行蜘蛛:請注意,這兩個版本(已評論和未評論)在我的系統上都不起作用,而 curl 和瀏覽器都在工作。
from django.views.decorators.http import require_http_methods
from django.views.decorators.csrf import csrf_exempt
from django.http import HttpResponse
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapyd_api import ScrapydAPI
from uuid import uuid4
import requests
import os
scrapyd = ScrapydAPI('http://0.0.0.0:6800')
@csrf_exempt
@require_http_methods(["POST"])
def start_crawl(request):
search_token = uuid4()
settings = {
'brand' : request.POST['brand'],
'model' : request.POST['model'],
'search_token' : search_token,
}
task = scrapyd.schedule('default', 'autoscout', settings=settings)
# response = requests.post('http://0.0.0.0:6800/schedule.json', {
# 'project' : 'default',
# 'spider' : 'autoscout',
# 'brand' : 'fiat',
# 'model' : '500',
# 'search_token' : search_token,
# })
return HttpResponse(response)
如果您需要,這是我的 scrapy.cfg
[settings]
default = settings
[deploy]
project = .
[scrapyd]
bind_address = 0.0.0.0
http_port = 6800
最后,代碼產生的壓倒性異常:
django_1 | Internal Server Error: /crawler-bot/run/
django_1 | Traceback (most recent call last):
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 169, in _new_conn
django_1 | conn = connection.create_connection(
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 96, in create_connection
django_1 | raise err
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
django_1 | sock.connect(sa)
django_1 | ConnectionRefusedError: [Errno 111] Connection refused
django_1 |
django_1 | During handling of the above exception, another exception occurred:
django_1 |
django_1 | Traceback (most recent call last):
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
django_1 | httplib_response = self._make_request(
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
django_1 | conn.request(method, url, **httplib_request_kw)
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
django_1 | super(HTTPConnection, self).request(method, url, body=body, headers=headers)
django_1 | File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
django_1 | self._send_request(method, url, body, headers, encode_chunked)
django_1 | File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
django_1 | self.endheaders(body, encode_chunked=encode_chunked)
django_1 | File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
django_1 | self._send_output(message_body, encode_chunked=encode_chunked)
django_1 | File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
django_1 | self.send(msg)
django_1 | File "/usr/local/lib/python3.8/http/client.py", line 950, in send
django_1 | self.connect()
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 200, in connect
django_1 | conn = self._new_conn()
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 181, in _new_conn
django_1 | raise NewConnectionError(
django_1 | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fef2fb73ee0>: Failed to establish a new connection: [Errno 111] Connection refused
django_1 |
django_1 | During handling of the above exception, another exception occurred:
django_1 |
django_1 | Traceback (most recent call last):
django_1 | File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
django_1 | resp = conn.urlopen(
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
django_1 | retries = retries.increment(
django_1 | File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 573, in increment
django_1 | raise MaxRetryError(_pool, url, error or ResponseError(cause))
django_1 | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fef2fb73ee0>: Failed to establish a new connection: [Errno 111] Connection refused'))
django_1 |
django_1 | During handling of the above exception, another exception occurred:
django_1 |
django_1 | Traceback (most recent call last):
django_1 | File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
django_1 | response = get_response(request)
django_1 | File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py", line 181, in _get_response
django_1 | response = wrapped_callback(request, *callback_args, **callback_kwargs)
django_1 | File "/usr/local/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
django_1 | return view_func(*args, **kwargs)
django_1 | File "/usr/local/lib/python3.8/site-packages/django/views/decorators/http.py", line 40, in inner
django_1 | return func(request, *args, **kwargs)
django_1 | File "/app/scraper/views.py", line 25, in start_crawl
django_1 | task = scrapyd.schedule('default', 'autoscout', settings=settings)
django_1 | File "/usr/local/lib/python3.8/site-packages/scrapyd_api/wrapper.py", line 188, in schedule
django_1 | json = self.client.post(url, data=data, timeout=self.timeout)
django_1 | File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
django_1 | return self.request('POST', url, data=data, json=json, **kwargs)
django_1 | File "/usr/local/lib/python3.8/site-packages/scrapyd_api/client.py", line 37, in request
django_1 | response = super(Client, self).request(*args, **kwargs)
django_1 | File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
django_1 | resp = self.send(prep, **send_kwargs)
django_1 | File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
django_1 | r = adapter.send(request, **kwargs)
django_1 | File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
django_1 | raise ConnectionError(e, request=request)
django_1 | requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fef2fb73ee0>: Failed to establish a new connection: [Errno 111] Connection refused'))
django_1 | [15/Feb/2021 02:16:48] "POST /crawler-bot/run/ HTTP/1.1" 500 184852
感謝任何願意幫助我的人,我非常感謝你。
PS如果需要更多代碼,我當然可以更新問題。 如果我的問題不是最好的,我也很抱歉,但這是我第三次了。 我也在學習這個。
bind_address = 0.0.0.0
意味着可以從外部網絡訪問scrapyd
你需要在你的應用中使用 localhost:6800 來連接scrapyd
順便說一句,允許公開訪問scrapyd並不是一個好習慣,世界上任何人都可以在您的服務器上部署並運行它們,並利用您的系統
請啟用ufw
並禁用從外部訪問端口 6800
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.