简体   繁体   中英

How can I avoid webpage receiving Gateway Timeout when calling slow Python CGI script?

I have a LAMP server set up in EC2. A simple website hosted on this web server in /var/www/html/ allows a user to upload an audio file of people having a discussion via an input form:

<form action="../cgi-bin/store_mp3_view" method="post" accept-charset="utf-8" enctype="multipart/form-data">
    <label for="mp3">Audio file</label>
    <input type="file" name="filename" />
    <input type="submit" value="Upload" />
</form>

This audio file gets stored in /tmp/ . As you can see, this triggers a Python script I have in cgi-bin. Here is the script: http://pastebin.com/iNU6WSUV . This script then uploads the uploaded audio file from my web server to an API by Honda which will detect utterances and produce an audio file for each utterance as well as a json object containing metadata for each utterance. It appears the utterance files can be fetched separately, as well as the json for each utterance from Hondas API: https://api.hark.jp/docs/en/05_reference_webapi.html . My script waits for all of this processing to complete (all utterances to be processed and ready), then retrieves each audio file and sends it to Bing Speech API to get the text from speech. This is because I want to play each utterance audio file and associated text and metadata in the browser as the conversation happened in sequence/real-time. A player, if you will. The problem is all of this takes too long, as the browser is receiving a gateway timeout from the cgi script. It can take several minutes. Specifically, Hark takes a while to return the complete results of the audio analysis, but it appears I can query their API and retrieve intermediate results as mentioned earlier. However, the utterances don't finish in order, so utterance 3 may be ready before utterance 2, but I need to show 2 before 3 because conversations have an order of utterances. What is the best way to go about building an app that can do this? How can I background these API calls to not block and cause a timeout? Should I be using something like Flask for this web app? How can I render the results in the webpage as I iteratively poll and retrieve them from Hark? Is CGI the wrong tool for the job? Thanks.

Generally the way to handle long delay is using yield and sending partial data to client. Instead of obj.wait() you need a loop to check if status is finished and if not printing something like: ... and sleep for one second. This way you will not receive timeout.

While Ali Nikneshans answer was helpful, it seems CGI is not the right tool for the job. I decided to stop using a LAMP stack/CGI apps and setup a Tornado web server with web sockets, which allows me to do async calls easily, background tasks, and use coroutines to setup a data pipeline for polling the API endpoint and feeding the data into the browser.

This presentation was quite helpful for understanding coroutines:

http://www.dabeaz.com/coroutines/Coroutines.pdf .

And for Tornado:

http://www.tornadoweb.org/en/stable/index.html .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM