简体   繁体   English

从Flask应用程序运行scrapy

[英]Run scrapy from Flask application

I have a crawler which I want to run everytime a person goes to the link. 我有一个爬虫,我想每次有人去链接时运行。 Since all the other modules are in Flask, I was told to build this in Flask also. 由于所有其他模块都在Flask中,我被告知要在Flask中构建它。 I have installed scrapy and selenium both in the virtual environment and globally on the machine with root. 我已经在虚拟环境中安装了scrapy和selenium,并在root用户机器上全局安装了scrapy和selenium。

When I run the crawler through the terminal, everything works fine. 当我通过终端运行爬虫时,一切正常。 When I start the Flask application and visit xx.xx.xx.xx:8080/whats in the browser, this also works fine and runs my crawler and gets me the file. 当我启动Flask应用程序并访问浏览器中的xx.xx.xx.xx:8080/whats时,这也可以正常运行我的抓取工具并获取该文件。 But as soon as I go live so that anytime a person goes to the link, it gives me internal error in browser. 但是一旦我上线,以便任何人一旦进入链接,它就会在浏览器中给出内部错误。

In order to run crawler, we have to type "scrapy crawl whateverthespidernameis" in the terminal. 为了运行crawler,我们必须在终端中键入“scrapy crawl whateverthespidernameis”。 I did this using Python's os module. 我是使用Python的os模块完成的。

Here is my flask code: 这是我的烧瓶代码:

import sys
from flask import request, jsonify, render_template, url_for, redirect,   session, abort,render_template_string,send_file,send_from_directory
from flask import *
#from application1 import *
from main import *
from test123 import *
import os
app = Flask(__name__)

filename = ''
app = Flask(__name__)

@app.route('/whats')
def whats():
    os.getcwd()
    os.chdir("/var/www/myapp/whats")
    //cmd = "scrapy crawl whats"
    cmd = "sudo scrapy crawl whats"
    os.system(cmd)
    return send_file("/var/www/myapp/staticcsv/whats.csv", as_attachment =True)

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=8080,debug=True)

This is the error recorded in the log file when I run through live link: 这是我在实时链接中运行时记录在日志文件中的错误:

sh: 1: scrapy: not found**

This is the error recorded in the log file when I use sudo in the command (variable cmd ): 这是我在命令中使用sudo时记录在日志文件中的错误(变量cmd ):

sudo: no tty present and no askpass program specified**

I am using uwsgi and nginx. 我正在使用uwsgi和nginx。

How can I run this crawler so that when anyone goes to "xx.xx.xx.xx/whats" the crawler runs and returns the csv file? 如何运行此爬虫,以便当任何人转到“xx.xx.xx.xx / whats”时,爬虫运行并返回csv文件?

When you use sudo the shell this starts will ask for a password on the tty - it specifically doesn't read standard input for this information. 当你使用sudo shell时,这将启动tty上的密码 - 它特别不会读取此信息的标准输入。 Since flask and other web applications typically run detached from a terminal, sudo has no way to ask for a password, so it looks for a program that can provide the password. 由于flask和其他Web应用程序通常从终端分离,因此sudo无法请求密码,因此它会查找可以提供密码的程序。 You can find more information on this topic in this answer . 您可以在此答案中找到有关此主题的更多信息。

The reason you aren't finding scrapy is most likely because of differences in your $PATH between the interactive shells you used in testing and the process that's running flask . 您没有找到scrapy的原因很可能是因为您在测试中使用的交互式shell与运行flask的过程之间的$PATH存在差异。 The easiest way to get around this is to give the full path to the scrapy program in your command. 解决这个问题的最简单方法是在命令中提供scrapy程序的完整路径。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM