简体   繁体   中英

How do I connect browser's microphone in my flask app?

I am using speech_recognition module to identify a search query through voice and then open a google chrome page showing the result for the query. Basically, it's a replacement of the google voice search but it's initiated through the terminal. But I want to make this into a web-app. I created the flask app:

-Search(directory)

-search.py (opens a tab using terminal directly/works independently)

-app.py (main flask app)

-static(directory)

-templates (directory)

But since the app is hosted on the server, my search.py takes input from the server mic(in this case it's my PC's mic/ but on AWS, it won't work). How do I take input from the client browser and use it in speech.py? Should I delete this file and use it directly in my main app? What is the most effective way to execute this functionality?

Here is my search.py script if anyone wants to know: It works through the terminal.

import subprocess

import speech_recognition as sr

browser_exe_path = "..."

r=sr.Recognizer()
with sr.Microphone() as source:
    print("Listening!")
    audio=r.listen(source)

    try:
        s_name=r.recognize_google(audio)
        """
        Code to open browser and search the query
        """
    except:
        print("Error!")

These two would probably be the best ways:

  • make a module/package of your own speech recognition tool and import it into your flask app
  • integrate the functionality itself into the app.

If you plan on using it again, it might be a good idea to keep the speech recognition separate from the web app, because then you can use it again. But you can customise it much more if you integrate it with, for example, the view functions for your application. Also, you should probably put all your search.py logic in one function or class, so that you can call it. Otherwise, if you import it as it is now, it will immediately run.

Either way, you need a speech structure that looks something like this:

  1. The user submits some speech, either live, recorded, or as a file. We'll call this speech file speech.wav (or any other file type, your choice)
  2. speech.wav is read and parsed by your speech recognition tool. It might return a list of words, or maybe just a string. We'll call this output .
  3. output is returned to the webpage and rendered as something for the user to read.

I suggest starting with a form submission and if you can get that to work, you can try a live speech recognition with AJAX. Start basic and just ask the user to add an audio file or record one. The following script will open up the file browser if on desktop, or get the user to record if on iOS or Android.

  <input name="audio-recording" type="file" accept="audio/*" id="audio-recording" capture>
  <label for="audio-recording">Add Audio</label>

  <p id="output"></p>

So once they've got a file there you need to access it. You may want to customise it, but here is a basic script which will take control of the above audio. Credit for this script goes to google developers.

<script>
  const recorder = document.getElementById('audio-recording');

  recorder.addEventListener('change', function(e) {
    const file = e.target.files[0];
    const url = URL.createObjectURL(file);
    // Do something with the audio file.
    
  });
</script>

Where it says // Do something with the audio file , it might be a cool idea to make an AJAX GET request, which will return the sentence. But this is where it gets really tricky, because you need to give the information to flask in arguments, not an audio file. But because we've stored the place where the file exists at the constant url in our script, we can use that as the argument, for example:

from flask import request, jsonify
import search # this is your own search.py that you mentioned in your question.

@app.route("/process_audio")
def process_audio():
    url = request.args.get("url")
    text = search.a_function(url) #returns the text from the audio, which you've done, so I've omitted code
    if text != None
        return jsonify(result="success",text=text)
    else:
        return jsonify(result="fail")

This'll return data in something called JSON format, which is like the bridge between client side js and server side python. It might look something like this:

{
 "result":"success",
 "text":"This is a test voice recording"
}

Then, you need to have some jQuery (or any other js library, but jQuery is nice and easy) to manage the AJAX call:

<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
    <script type=text/javascript>
        const recorder = document.getElementById('audio-recording');

  recorder.addEventListener('change', function(e) {
    const file = e.target.files[0];
    const url = URL.createObjectURL(file);
    $.getJSON('/process_audio', {
          url: url 
        }, function(data) {
          $("#output").text(data.text);
            });
            return false;
          
    </script>

Apologies for any bracketing errors there. So that should send a GET request for some JSON to the URL of "/audio_process", which will return what we saw earlier, and then it will output the "text" of the JSON to the "#output" HTML selector.

There may be some debugging needed, but that seems to do the trick.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM