简体   繁体   English

Flask多线程

[英]Multithreading with Flask

I'd like to call generate_async_audio_service from a view and have it asynchronously generate audio files for the list of words using a threading pool and then commit them to a database. 我想从一个视图调用generate_async_audio_service ,并让它使用线程池异步生成单词列表的音频文件,然后将它们提交到数据库。

I keep running into an error that I'm working out of the application context even though I'm creating a new polly and s3 instance each time. 即使每次都创建一个新的polly和s3实例,我仍然遇到一个错误,我正在应用程序上下文之外工作。

How can I generate/upload multiple audio files at once? 如何一次生成/上传多个音频文件?

from flask import current_app, 
from multiprocessing.pool import ThreadPool
from Server.database import db
import boto3
import io
import uuid


def upload_audio_file_to_s3(file):
   app = current_app._get_current_object()
   with app.app_context():
      s3 = boto3.client(service_name='s3',
               aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),
               aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'))
      extension = file.filename.rsplit('.', 1)[1].lower()
      file.filename = f"{uuid.uuid4().hex}.{extension}"
      s3.upload_fileobj(file,
         app.config.get('S3_BUCKET'),
         f"{app.config.get('UPLOADED_AUDIO_FOLDER')}/{file.filename}",
         ExtraArgs={"ACL": 'public-read', "ContentType": file.content_type})
      return file.filename

def generate_polly(voice_id, text):
   app = current_app._get_current_object()
   with app.app_context():
      polly_client = boto3.Session(
         aws_access_key_id=app.config.get('BOTO3_ACCESS_KEY'),                   
         aws_secret_access_key=app.config.get('BOTO3_SECRET_KEY'),
         region_name=app.config.get('AWS_REGION')).client('polly')
      response = polly_client.synthesize_speech(VoiceId=voice_id,
                     OutputFormat='mp3', Text=text)
      return response['AudioStream'].read()


def generate_polly_from_term(vocab_term, gender='m'):
   app = current_app._get_current_object()
   with app.app_context():
      audio = generate_polly('Celine', vocab_term.term)
      file = io.BytesIO(audio)
      file.filename = 'temp.mp3'
      file.content_type = 'mp3'
      return vocab_term.id, upload_audio_file_to_s3(file)

def generate_async_audio_service(terms):
   pool = ThreadPool(processes=12)
   results = pool.map(generate_polly_from_term, terms)
   # do something w/ results

This is not necessarily a fleshed-out answer, but rather than putting things into comments I'll put it here. 这不一定是充实的答案,而是将其放入注释中,而不是将其放入注释中。

Celery is a task manager for python. Celery是python的任务管理器。 The reason you would want to use this is if you have tasks pinging Flask, but they take longer to finish than the interval of tasks coming in, then certain tasks will be blocked and you won't get all of your results. 您要使用此功能的原因是,如果您有任务对Flask执行ping操作,但完成这些任务所需的时间比进入任务的间隔要长,那么某些任务将被阻止,您将无法获得所有结果。 To fix this, you hand it to another process. 要解决此问题,请将其交给另一个进程。 This goes like so: 就像这样:

1) Client sends a request to Flask to process audio files

2) The files land in Flask to be processed, Flask will send an asyncronous task to Celery.

3) Celery is notified of the task and stores its state in some sort of messaging system (RabbitMQ and Redis are the canonical examples)

4) Flask is now unburdened from that task and can receive more

5) Celery finishes the task, including the upload to your database

Celery and Flask are then two separate python processes communicating with one another. 然后Celery和Flask是彼此通信的两个独立的python进程。 That should satisfy your multithreaded approach. 那应该满足您的多线程方法。 You can also retrieve the state from a task through Flask if you want the client to verify that the task was/was not completed. 如果您希望客户端验证任务是否完成,您还可以通过Flask从任务中检索状态。 The route in your Flask app.py would look like: Flask app.py的路由如下所示:

@app.route('/my-route', methods=['POST'])
def process_audio():
    # Get your files and save to common temp storage
    save_my_files(target_dir, files)

    response = celery_app.send_tast('celery_worker.files', args=[target_dir])
    return jsonify({'task_id': response.task_id})

Where celery_app comes from another module worker.py : celery_app来自另一个模块worker.py

import os
from celery import Celery

env = os.environ

# This is for a rabbitMQ backend
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL', 'amqp://0.0.0.0:5672/0')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')

celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)

Then, your celery process would have a worker configured something like: 然后,您的芹菜加工过程中将需要一个工作人员配置以下内容:

from celery import Celery
from celery.signals import after_task_publish

env = os.environ
CELERY_BROKER_URL = env.get('CELERY_BROKER_URL')
CELERY_RESULT_BACKEND = env.get('CELERY_RESULT_BACKEND', 'rpc://')

# Set celery_app with name 'tasks' using the above broker and backend
celery_app = Celery('tasks', broker=CELERY_BROKER_URL, backend=CELERY_RESULT_BACKEND)

@celery_app.task(name='celery_worker.files')
def async_files(path):
    # Get file from path
    # Process
    # Upload to database
    # This is just if you want to return an actual result, you can fill this in with whatever
    return {'task_state': "FINISHED"}

This is relatively basic, but could serve as a starting point. 这是相对基本的,但可以作为起点。 I will say that some of Celery's behavior and setup is not always the most intuitive, but this will leave your flask app available to whoever wants to send files to it without blocking anything else. 我要说的是,芹菜的某些行为和设置并不总是最直观的,但这将使您的flask应用程序可用于任何希望向其发送文件的用户,而不会阻止其他操作。

Hopefully that's somewhat helpful 希望这会有所帮助

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM