简体   繁体   English

使用 azure 语音转文本时保存麦克风音频输入

[英]save microphone audio input when using azure speech to text

I'm currently using Azure speech to text in my project.我目前在我的项目中使用 Azure 语音来发短信。 It is recognizing speech input directly from microphone (which is what I want) and saving the text output, but I'm also interested in saving that audio input so that I can listen to it later on.它直接识别来自麦克风的语音输入(这是我想要的)并保存文本 output,但我也有兴趣保存该音频输入以便稍后收听。 Before moving to Azure I was using the python speech recognition library with recognize_google, that allowed me to use get_wav_data() to save the input as a.wav file.在转到 Azure 之前,我使用了带有 recognize_google 的 python 语音识别库,它允许我使用 get_wav_data() 将输入保存为 a.wav 文件。 Is there something similar I can use with Azure? Azure 可以使用类似的东西吗? I read the documentation but could only find ways to save audio files for text to speech.我阅读了文档,但只能找到将音频文件保存为文本到语音的方法。 My temporary solution is to save the audio input myself first and then use the azure stt on that audio file rather than directly using the microphone for input, but I'm worried this will slow down the process.我的临时解决方案是先自己保存音频输入,然后在该音频文件上使用 azure stt 而不是直接使用麦克风输入,但我担心这会减慢过程。 Any ideas?有任何想法吗? Thank you in advance!先感谢您!

This is Darren from the Microsoft Speech SDK Team.我是 Microsoft 演讲 SDK 团队的 Darren。 Unfortunately, at the moment there is no built-in support for simultaneously doing live recognition from a microphone and writing the audio to a WAV file.不幸的是,目前没有内置支持同时从麦克风进行实时识别并将音频写入 WAV 文件。 We have heard this customer request before and we will consider adding this feature in a future version of the Speech SDK.我们之前已经听过这个客户的要求,我们会考虑在未来版本的 Speech SDK 中加入这个功能。

What I think you can do at the moment (it will require a bit of programming on your part), is use Speech SDK with a push stream. You can write code to read audio buffers from the microphone and write it to a WAV file.我认为您目前可以做的(这需要您进行一些编程)是使用语音 SDK 和推送 stream。您可以编写代码以从麦克风读取音频缓冲区并将其写入 WAV 文件。 At the same time, you can push the same audio buffers into Speech SDK for recognition.同时,您可以将相同的音频缓冲区推送到 Speech SDK 中进行识别。 We have Python samples showing how to use Speech SDK with push stream. See function "speech_recognition_with_push_stream" in this file: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py .我们有Python示例,显示如何使用语音SDK与推动stream。请参阅function“ secience_recognition_with_with_push_push_push_stream/console/speech_sample.py However, I'm not familiar with Python options for reading real-time audio buffers from a Microphone, and writing to WAV file.但是,我不熟悉 Python 从麦克风读取实时音频缓冲区并写入 WAV 文件的选项。 Darren达伦

If you use Azure's speech_recognizer.recognize_once_async() , you can simultaneously capture the microphone with pyaudio .如果您使用 Azure 的speech_recognizer.recognize_once_async() ,您可以同时使用pyaudio捕获麦克风。 Below is the code I use:下面是我使用的代码:

#!/usr/bin/env python3

# enter your output path here:
output_file='/Users/username/micaudio.wav'

import pyaudio, signal, sys, os, requests, wave
pa = pyaudio.PyAudio()
import azure.cognitiveservices.speech as speechsdk

def vocrec_callback(in_data, frame_count, time_info, status):
    global voc_data
    voc_data['frames'].append(in_data)
    return (in_data, pyaudio.paContinue)

def vocrec_start():
    global voc_stream
    global voc_data
    voc_data = {
        'channels':1 if sys.platform == 'darwin' else 2,
        'rate':44100,
        'width':pa.get_sample_size(pyaudio.paInt16),
        'format':pyaudio.paInt16,
        'frames':[]
    }
    voc_stream = pa.open(format=voc_data['format'],
                    channels=voc_data['channels'],
                    rate=voc_data['rate'],
                    input=True,
                    output=False,
                    stream_callback=vocrec_callback)
    
def vocrec_stop():
    voc_stream.close()

def vocrec_write():
    with wave.open(output_file, 'wb') as wave_file:
        wave_file.setnchannels(voc_data['channels'])
        wave_file.setsampwidth(voc_data['width'])
        wave_file.setframerate(voc_data['rate'])
        wave_file.writeframes(b''.join(voc_data['frames']))

class SIGINT_handler():
    def __init__(self):
        self.SIGINT = False
    def signal_handler(self, signal, frame):
        self.SIGINT = True
        print('You pressed Ctrl+C!')
        vocrec_stop()
        quit()

def init_azure():
    global speech_recognizer
    #  ——— check azure keys
    my_speech_key = os.getenv('SPEECH_KEY')
    if my_speech_key is None:
        error_and_quit("Error: No Azure Key.")
    my_speech_region = os.getenv('SPEECH_REGION')
    if my_speech_region is None:
        error_and_quit("Error: No Azure Region.")
    _headers = {
        'Ocp-Apim-Subscription-Key': my_speech_key,
        'Content-type': 'application/x-www-form-urlencoded',
        # 'Content-Length': '0',
    }
    _URL = f"https://{my_speech_region}.api.cognitive.microsoft.com/sts/v1.0/issueToken"
    _response = requests.post(_URL,headers=_headers)
    if not "200" in str(_response):
        error_and_quit("Error: Wrong Azure Key Or Region.")
    #  ——— keys correct. continue
    speech_config = speechsdk.SpeechConfig(subscription=os.environ.get('SPEECH_KEY'),
                                           region=os.environ.get('SPEECH_REGION'))
    audio_config_stt = speechsdk.audio.AudioConfig(use_default_microphone=True)
    speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_RequestSentenceBoundary, 'true')
    #  ——— disable profanity filter:
    speech_config.set_property(speechsdk.PropertyId.SpeechServiceResponse_ProfanityOption, "2")
    speech_config.speech_recognition_language="en-US"
    speech_recognizer = speechsdk.SpeechRecognizer(
        speech_config=speech_config,
        audio_config=audio_config_stt)

def error_and_quit(_error):
     print(error)
     quit()

def recognize_speech ():
    vocrec_start()
    print("Say something: ")
    speech_recognition_result = speech_recognizer.recognize_once_async().get()
    print("Recording done.")
    vocrec_stop()
    vocrec_write()
    quit()

handler = SIGINT_handler()
signal.signal(signal.SIGINT, handler.signal_handler)

init_azure()
recognize_speech()

any update on the feature?该功能有任何更新吗? It would be great to have this.有这个就好了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用“use_default_microphone”参数在 Web 服务器上启动连续识别(Azure 认知服务语音) - How to Start Continuous Recognition on a Web Server using 'use_default_microphone' parametre (Azure Cognitive Services Speech ) 使用 azure 语音到文本的实时对话转录 - Real-time Conversation Transcription using azure speech to text 文本到语音 - 没有在 .wav 中获取音频(连接被拒绝) - Text to speech - not getting audio in the .wav (connection refused) 如何将 Azure 认知服务语音音频 output 写入 Azure Blob? - How to write Azure Cognitive Services Speech audio output to an Azure Blob? Twilio<stream> 至 Azure 连续语音转文字</stream> - Twilio <Stream> to Azure Continuous Speech To Text 如何使用 Python Azure 文本到语音 api 生成 mp3 文件 - How do I generate an mp3 file using Python Azure text to speech api Azure 语音转文本忽略数字 - Azure speech-to-text ignores numbers Azure 函数 HTTP 触发器 python 文字转语音程序 - Azure Functions HTTP trigger python text to speech program Azure 语音到文本 SpeechSDK.SpeechConfig.fromAuthorizationToken() 不工作 - Azure Speech To Text SpeechSDK.SpeechConfig.fromAuthorizationToken() Not Working 当检测到静音(JS)时,如何提取前面的音频(来自麦克风)作为缓冲区? - How can I extract the preceding audio (from microphone) as a buffer when silence is detected (JS)?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM