繁体   English   中英

Scrapy Spider 未在 docker 容器中执行关闭方法

[英]Scrapy spider not executing close method in docker container

我有一个烧瓶应用程序,它将运行一个爬虫蜘蛛。 该应用程序在我的开发机器上运行良好,但是当我在容器中运行它时,不会执行蜘蛛的 close 方法。

这是蜘蛛的代码:

# -*- coding: utf-8 -*-
import scrapy
from bs4 import BeautifulSoup
from scrapy.exceptions import CloseSpider


class ToScrapeCSSSpider(scrapy.Spider):
    name = "toscrape-css"
    start_urls = [
        'http://quotes.toscrape.com/',
    ]

    def parse(self, response):
        page_text = response.text
        # raise CloseSpider("Blocked")

        soup = BeautifulSoup(page_text, "lxml")
        if "xml" in str.lower(page_text[:20]):
            sitemap = True
            links = soup.findAll("loc")
            for link in links:
                yield scrapy.Request(url=link.text, callback=self.parse)

        else:
            raise CloseSpider("I want to close it")
    def close(spider, reason):
        print("Closing spider")
        # self.pbar.clear()
        # self.pbar.write('Closing {} spider'.format(spider.name))
        print("Spider closed")

这是我在 main.py 中的烧瓶应用程序:

import crochet
crochet.setup()     # initialize crochet

import json
import pandas as pd
from flask import  redirect, url_for, request
from scrapy.crawler import CrawlerRunner, CrawlerProcess
import time
from datetime import datetime, timedelta
import grequests
from flask import render_template, jsonify, Flask, redirect, url_for, request, flash
from app2.articles_finder.spiders.test_spider import ToScrapeCSSSpider
from app2 import app2



@app2.route("/test_docker")
def test_docker():
    scrap_docker()
    return  "Ok",200
@crochet.run_in_reactor
def scrap_docker():
    eventual = crawl_runner.crawl(ToScrapeCSSSpider)
    eventual.addCallback(finished_docker)

def finished_docker(null):
    print("Scrapping is over in docker container")

最后她是我的 docker 文件:

FROM phusion/baseimage:0.9.19

# Use baseimage-docker's init system.
CMD ["/sbin/my_init"]

ENV TERM=xterm-256color
ENV SCRAPPER_HOME=/app/links_finder
ENV PYTHON_VERSION="3.6.5"
ENV FRONT_ADDRESS = blabla



# Set the locale
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8


# Install necessary packages

RUN apt-get update && apt-get install -y \
    build-essential
#RUN apt-get update && apt-get install -y \
#    build-essential \


# Install core packages
#RUN apt-get update
RUN apt-get install -y build-essential checkinstall software-properties-common llvm cmake wget git nano nasm yasm zip unzip pkg-config \
    libreadline-gplv2-dev libncursesw5-dev libssl-dev libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev

# Install Python 3.6.5
RUN wget https://www.python.org/ftp/python/${PYTHON_VERSION}/Python-${PYTHON_VERSION}.tar.xz \
    && tar xvf Python-${PYTHON_VERSION}.tar.xz \
    && rm Python-${PYTHON_VERSION}.tar.xz \
    && cd Python-${PYTHON_VERSION} \
    && ./configure \
    && make altinstall \
    && cd / \
    && rm -rf Python-${PYTHON_VERSION}

RUN apt-get install -y python3-pip

WORKDIR ${SCRAPPER_HOME}
COPY . ${SCRAPPER_HOME}
RUN ls

#COPY  run_gunicorn_app_2.py ${SCRAPPER_HOME}


RUN pip3 install -r requirements2.txt



RUN chmod 777 -R *


# Clean up
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py

EXPOSE 3456

ENTRYPOINT python3 run_gunicorn_app_2.py
#ENTRYPOINT python3 ${SCRAPPER_HOME}/run_gunicorn_app_2.py

requirements2.txt 文件:

tqdm==4.19.4
APScheduler ==3.6.1
Flask==1.0.2
Flask-Admin==1.3.0
Flask-Bcrypt==0.7.1
Flask-DebugToolbar==0.10.0
Flask-Login==0.3.2
Flask-Mail==0.9.1
Flask-Script==2.0.5
Flask-SQLAlchemy==2.1
Flask-WTF==0.12
Flask-redis==0.4.0
gunicorn==19.4.5
itsdangerous==0.24
pytz==2016.10
structlog==16.1.0
termcolor==1.1.0
WTForms==2.1
scrapy==1.6.0
grequests==0.4.0
#pandas==0.24
crochet==1.10.0
redis==3.3.8
beautifulsoup4==4.7.1
publicsuffixlist==0.7.1
PyMySQL==0.9.3

当我运行 docker 容器时,这就是我得到的: 在此处输入图片说明

显然:close 方法根本没有执行。 任何提示? 我已经被这个问题困扰了很长一段时间,所以任何线索都会受到欢迎。 谢谢!

经过大量调试,最终似乎没有问题。 我只需要在 python3 之后添加 -u 来添加日志记录。

ENTRYPOINT python3 -u run_gunicorn_app_2.py

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM