使用 azure 语音转文本时保存麦克风音频输入

Question

I'm currently using Azure speech to text in my project.我目前在我的项目中使用 Azure 语音来发短信。 It is recognizing speech input directly from microphone (which is what I want) and saving the text output, but I'm also interested in saving that audio input so that I can listen to it later on.它直接识别来自麦克风的语音输入（这是我想要的）并保存文本 output，但我也有兴趣保存该音频输入以便稍后收听。 Before moving to Azure I was using the python speech recognition library with recognize_google, that allowed me to use get_wav_data() to save the input as a.wav file.在转到 Azure 之前，我使用了带有 recognize_google 的 python 语音识别库，它允许我使用 get_wav_data() 将输入保存为 a.wav 文件。 Is there something similar I can use with Azure? Azure 可以使用类似的东西吗？ I read the documentation but could only find ways to save audio files for text to speech.我阅读了文档，但只能找到将音频文件保存为文本到语音的方法。 My temporary solution is to save the audio input myself first and then use the azure stt on that audio file rather than directly using the microphone for input, but I'm worried this will slow down the process.我的临时解决方案是先自己保存音频输入，然后在该音频文件上使用 azure stt 而不是直接使用麦克风输入，但我担心这会减慢过程。 Any ideas?有任何想法吗？ Thank you in advance!先感谢您！

Answer 1

This is Darren from the Microsoft Speech SDK Team.我是 Microsoft 演讲 SDK 团队的 Darren。 Unfortunately, at the moment there is no built-in support for simultaneously doing live recognition from a microphone and writing the audio to a WAV file.不幸的是，目前没有内置支持同时从麦克风进行实时识别并将音频写入 WAV 文件。 We have heard this customer request before and we will consider adding this feature in a future version of the Speech SDK.我们之前已经听过这个客户的要求，我们会考虑在未来版本的 Speech SDK 中加入这个功能。

What I think you can do at the moment (it will require a bit of programming on your part), is use Speech SDK with a push stream. You can write code to read audio buffers from the microphone and write it to a WAV file.我认为您目前可以做的（这需要您进行一些编程）是使用语音 SDK 和推送 stream。您可以编写代码以从麦克风读取音频缓冲区并将其写入 WAV 文件。 At the same time, you can push the same audio buffers into Speech SDK for recognition.同时，您可以将相同的音频缓冲区推送到 Speech SDK 中进行识别。 We have Python samples showing how to use Speech SDK with push stream. See function "speech_recognition_with_push_stream" in this file: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py .我们有Python示例，显示如何使用语音SDK与推动stream。请参阅function“ secience_recognition_with_with_push_push_push_stream ” /console/speech_sample.py 。 However, I'm not familiar with Python options for reading real-time audio buffers from a Microphone, and writing to WAV file.但是，我不熟悉 Python 从麦克风读取实时音频缓冲区并写入 WAV 文件的选项。 Darren达伦

Answer 2

If you use Azure's speech_recognizer.recognize_once_async() , you can simultaneously capture the microphone with pyaudio .如果您使用 Azure 的speech_recognizer.recognize_once_async() ，您可以同时使用pyaudio捕获麦克风。 Below is the code I use:下面是我使用的代码：

#!/usr/bin/env python3

# enter your output path here:
output_file='/Users/username/micaudio.wav'

import pyaudio, signal, sys, os, requests, wave
pa = pyaudio.PyAudio()
import azure.cognitiveservices.speech as speechsdk

def vocrec_callback(in_data, frame_count, time_info, status):
    global voc_data
    voc_data['frames'].append(in_data)
    return (in_data, pyaudio.paContinue)

def vocrec_start():
    global voc_stream
    global voc_data
    voc_data = {
        'channels':1 if sys.platform == 'darwin' else 2,
        'rate':44100,
        'width':pa.get_sample_size(pyaudio.paInt16),
        'format':pyaudio.paInt16,
        'frames':[]
    }
    voc_stream = pa.open(format=voc_data['format'],
                    channels=voc_data['channels'],
                    rate=voc_data['rate'],
                    input=True,
                    output=False,
                    stream_callback=vocrec_callback)
    
def vocrec_stop():
    voc_stream.close()

def vocrec_write():
    with wave.open(output_file, 'wb') as wave_file:
        wave_file.setnchannels(voc_data['channels'])
        wave_file.setsampwidth(voc_data['width'])
        wave_file.setframerate(voc_data['rate'])
        wave_file.writeframes(b''.join(voc_data['frames']))

class SIGINT_handler():
    def __init__(self):
        self.SIGINT = False
    def signal_handler(self, signal, frame):
        self.SIGINT = True
        print('You pressed Ctrl+C!')
        vocrec_stop()
        quit()

def init_azure():
    global speech_recognizer
    #  ——— check azure keys
    my_speech_key = os.getenv('SPEECH_KEY')
    if my_speech_key is None:
        error_and_quit("Error: No Azure Key.")
    my_speech_region = os.getenv('SPEECH_REGION')
    if my_speech_region is None:
        error_and_quit("Error: No Azure Region.")
    _headers = {
        'Ocp-Apim-Subscription-Key': my_speech_key,
        'Content-type': 'application/x-www-form-urlencoded',
        # 'Content-Length': '0',
    }
    _URL = f"https://{my_speech_region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"
    _response = requests.post(_URL,headers=_headers)
    if not "200" in str(_response):
        error_and_quit("Error: Wrong Azure Key Or Region.")
    #  ——— keys correct. continue
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'),
                                           region=os.environ.get('SPEECH_REGION'))
    audio_config_stt = speechsdk.audio.AudioConfig(use_default_microphone=True)
    speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_RequestSentenceBoundary, 'true')
    #  ——— disable profanity filter:
    speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_ProfanityOption, "2")
    speech_config.speech_recognition_language="en-US"
    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config,
        audio_config=audio_config_stt)

def error_and_quit(_error):
     print(error)
     quit()

def recognize_speech ():
    vocrec_start()
    print("Say something: ")
    speech_recognition_result = speech_recognizer.recognize_once_async().get()
    print("Recording done.")
    vocrec_stop()
    vocrec_write()
    quit()

handler = SIGINT_handler()
signal.signal(signal.SIGINT, handler.signal_handler)

init_azure()
recognize_speech()

Answer 3

any update on the feature?该功能有任何更新吗？ It would be great to have this.有这个就好了。

使用 azure 语音转文本时保存麦克风音频输入

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-04-05 15:48:31

解决方案2
0 2023-01-29 09:46:10

解决方案3
-3 2023-01-20 20:33:27

使用 azure 语音转文本时保存麦克风音频输入

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-04-05 15:48:31

解决方案2 0 2023-01-29 09:46:10

解决方案3 -3 2023-01-20 20:33:27

解决方案1
1 已采纳 2022-04-05 15:48:31

解决方案2
0 2023-01-29 09:46:10

解决方案3
-3 2023-01-20 20:33:27