簡體   English   中英

Docker 中的 Scrapyd + Django:HTTPConnectionPool(主機 = '0.0.0.0',端口 = 6800)錯誤

[英]Scrapyd + Django in Docker: HTTPConnectionPool (host = '0.0.0.0', port = 6800) error

我是一個正在尋求幫助的意大利小男孩。我正在使用 django 和 scrapyd 為我的 web 刮刀構建 web 接口。 這是我第一次使用 scrapy,但由於網上有大量的文檔,我學得很快。 然而,我發現自己在通過 scrapyd_api.ScrapydAPI 啟動我的蜘蛛時遇到了很大的困難。 盡管在正確的端口啟動服務器(curl 和瀏覽器請求都有效),但 django 返回 requests.exceptions.ConnectionError: HTTPConnectionPool (host = '0.0.0.0', port = 6800) 錯誤。

首先,這是我的文件夾結構:

    scraper
    ├── admin.py
    ├── apps.py
    ├── dbs
    │   └── default.db
    ├── __init__.py
    ├── items.py
    ├── logs
    │   └── default
    │       └── autoscout
    │           ├── 0b2585dc6f2011eba4d30242ac140002.log
    │           ├── 1fd803a66f2011eba4d30242ac140002.log
    │           └── 6fac4d646f2111eba4d30242ac140002.log
    ├── middlewares.py
    ├── migrations
    │   ├── 0001_initial.py
    │   ├── 0002_auto_20210214_2019.py
    │   ├── 0003_auto_search_token.py
    │   ├── __init__.py
    │   └── __pycache__
    │       ├── 0001_initial.cpython-38.pyc
    │       ├── [...]
    │       └── __init__.cpython-38.pyc
    ├── models.py
    ├── pipelines.py
    ├── __pycache__
    │   ├── admin.cpython-38.pyc
    │   ├── [...]
    │   └── views.cpython-38.pyc
    ├── serializers.py
    ├── settings.py
    ├── spiders
    │   ├── AutoScout.py
    │   ├── __init__.py
    │   └── __pycache__
    │       ├── AutoScout.cpython-38.pyc
    │       └── __init__.cpython-38.pyc
    ├── urls.py
    └── views.py

和我的 docker 組成:

    version: "3.9"
       
    services:
      django:
        build: .
        command: python manage.py runserver 0.0.0.0:8000
        volumes:
          - .:/app
        ports:
          - "8000:8000"
      
      scrapyd:
        build: .
        command: bash -c "cd /app/scraper && scrapyd"
        volumes:
            - .:/app
        ports:
            - "6800:6800"
        tty: true
        stdin_open: true
        dns:
            - 8.8.8.8

在這里,我們 go 嘗試通過 scrapyd 運行蜘蛛:請注意,這兩個版本(已評論和未評論)在我的系統上都不起作用,而 curl 和瀏覽器都在工作。

from django.views.decorators.http import require_http_methods
from django.views.decorators.csrf import csrf_exempt
from django.http import HttpResponse

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapyd_api import ScrapydAPI

from uuid import uuid4
import requests
import os

scrapyd = ScrapydAPI('http://0.0.0.0:6800')

@csrf_exempt
@require_http_methods(["POST"])
def start_crawl(request):
    search_token = uuid4()

    settings = {
        'brand' : request.POST['brand'], 
        'model' : request.POST['model'],
        'search_token' : search_token,
    }
    task = scrapyd.schedule('default', 'autoscout', settings=settings)

    # response = requests.post('http://0.0.0.0:6800/schedule.json', {
    #     'project' : 'default',
    #     'spider' : 'autoscout',
    #     'brand' : 'fiat',
    #     'model' : '500',
    #     'search_token' : search_token,
    # })

    return HttpResponse(response)

如果您需要,這是我的 scrapy.cfg

[settings]
default = settings

[deploy]
project = .

[scrapyd]
bind_address = 0.0.0.0
http_port   = 6800

最后,代碼產生的壓倒性異常:

django_1   | Internal Server Error: /crawler-bot/run/
django_1   | Traceback (most recent call last):
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 169, in _new_conn
django_1   |     conn = connection.create_connection(
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 96, in create_connection
django_1   |     raise err
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
django_1   |     sock.connect(sa)
django_1   | ConnectionRefusedError: [Errno 111] Connection refused
django_1   | 
django_1   | During handling of the above exception, another exception occurred:
django_1   | 
django_1   | Traceback (most recent call last):
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
django_1   |     httplib_response = self._make_request(
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
django_1   |     conn.request(method, url, **httplib_request_kw)
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
django_1   |     super(HTTPConnection, self).request(method, url, body=body, headers=headers)
django_1   |   File "/usr/local/lib/python3.8/http/client.py", line 1255, in request
django_1   |     self._send_request(method, url, body, headers, encode_chunked)
django_1   |   File "/usr/local/lib/python3.8/http/client.py", line 1301, in _send_request
django_1   |     self.endheaders(body, encode_chunked=encode_chunked)
django_1   |   File "/usr/local/lib/python3.8/http/client.py", line 1250, in endheaders
django_1   |     self._send_output(message_body, encode_chunked=encode_chunked)
django_1   |   File "/usr/local/lib/python3.8/http/client.py", line 1010, in _send_output
django_1   |     self.send(msg)
django_1   |   File "/usr/local/lib/python3.8/http/client.py", line 950, in send
django_1   |     self.connect()
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 200, in connect
django_1   |     conn = self._new_conn()
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 181, in _new_conn
django_1   |     raise NewConnectionError(
django_1   | urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fef2fb73ee0>: Failed to establish a new connection: [Errno 111] Connection refused
django_1   | 
django_1   | During handling of the above exception, another exception occurred:
django_1   | 
django_1   | Traceback (most recent call last):
django_1   |   File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
django_1   |     resp = conn.urlopen(
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
django_1   |     retries = retries.increment(
django_1   |   File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 573, in increment
django_1   |     raise MaxRetryError(_pool, url, error or ResponseError(cause))
django_1   | urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='0.0.0.0', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fef2fb73ee0>: Failed to establish a new connection: [Errno 111] Connection refused'))
django_1   | 
django_1   | During handling of the above exception, another exception occurred:
django_1   | 
django_1   | Traceback (most recent call last):
django_1   |   File "/usr/local/lib/python3.8/site-packages/django/core/handlers/exception.py", line 47, in inner
django_1   |     response = get_response(request)
django_1   |   File "/usr/local/lib/python3.8/site-packages/django/core/handlers/base.py", line 181, in _get_response
django_1   |     response = wrapped_callback(request, *callback_args, **callback_kwargs)
django_1   |   File "/usr/local/lib/python3.8/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
django_1   |     return view_func(*args, **kwargs)
django_1   |   File "/usr/local/lib/python3.8/site-packages/django/views/decorators/http.py", line 40, in inner
django_1   |     return func(request, *args, **kwargs)
django_1   |   File "/app/scraper/views.py", line 25, in start_crawl
django_1   |     task = scrapyd.schedule('default', 'autoscout', settings=settings)
django_1   |   File "/usr/local/lib/python3.8/site-packages/scrapyd_api/wrapper.py", line 188, in schedule
django_1   |     json = self.client.post(url, data=data, timeout=self.timeout)
django_1   |   File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
django_1   |     return self.request('POST', url, data=data, json=json, **kwargs)
django_1   |   File "/usr/local/lib/python3.8/site-packages/scrapyd_api/client.py", line 37, in request
django_1   |     response = super(Client, self).request(*args, **kwargs)
django_1   |   File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
django_1   |     resp = self.send(prep, **send_kwargs)
django_1   |   File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
django_1   |     r = adapter.send(request, **kwargs)
django_1   |   File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
django_1   |     raise ConnectionError(e, request=request)
django_1   | requests.exceptions.ConnectionError: HTTPConnectionPool(host='0.0.0.0', port=6800): Max retries exceeded with url: /schedule.json (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fef2fb73ee0>: Failed to establish a new connection: [Errno 111] Connection refused'))
django_1   | [15/Feb/2021 02:16:48] "POST /crawler-bot/run/ HTTP/1.1" 500 184852

感謝任何願意幫助我的人,我非常感謝你。

PS如果需要更多代碼,我當然可以更新問題。 如果我的問題不是最好的,我也很抱歉,但這是我第三次了。 我也在學習這個。

bind_address = 0.0.0.0

意味着可以從外部網絡訪問scrapyd

你需要在你的應用中使用 localhost:6800 來連接scrapyd

順便說一句,允許公開訪問scrapyd並不是一個好習慣,世界上任何人都可以在您的服務器上部署並運行它們,並利用您的系統

請啟用ufw並禁用從外部訪問端口 6800

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM