简体   繁体   English

从Flask路线开始scrapy

[英]Start scrapy from Flask route

I want to build a crawler which takes the URL of a webpage to be scraped and returns the result back to a webpage. 我想构建一个抓取网页的URL的抓取工具,并将结果返回给网页。 Right now I start scrapy from the terminal and store the response in a file. 现在我从终端开始scrapy并将响应存储在一个文件中。 How can I start the crawler when some input is posted on to Flask, process, and return a response back? 如何将某些输入发布到Flask上,处理并返回响应,我该如何启动爬虫?

You need to create a CrawlerProcess inside your Flask application and run the crawl programmatically. 您需要在Flask应用程序中创建CrawlerProcess并以编程方式运行爬网。 See the docs . 查看文档

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start() # The script will block here until the crawl is finished

Before moving on with your project I advise you to look into a Python task queue (like rq ). 在继续你的项目之前,我建议你研究一下Python任务队列(比如rq )。 This will allow you to run Scrapy crawls in the background and your Flask application will not freeze while the scrapes are running. 这将允许您在后台运行Scrapy爬网,并且在刮擦运行时Flask应用程序不会冻结。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM