简体   繁体   English

Azure 语音转文本 - 连续识别

[英]Azure speech-to-text - Continuos Recognition

I would like to see the accuracy of the speech services from Azure, specifically speech-to-text using an audio file.我希望看到 Azure 语音服务的准确性,特别是使用音频文件的语音到文本。

I have been reading the documentation https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/?view=azure-python and playing around with a suggested code from the MS quickstar page.我一直在阅读文档https://docs.microsoft.com/en-us/python/api/azure-cognitiveservices-speech/?view=azure-python并使用 MS quickstar 页面中的建议代码。 The code workds fine and I can get some transcription, but it just transcribes the beginning of the audio (first utterance):代码工作正常,我可以得到一些转录,但它只是转录音频的开头(第一句话):

import azure.cognitiveservices.speech as speechsdk

speechKey = 'xxx'
service_region = 'westus'

speech_config = speechsdk.SpeechConfig(subscription=speechKey, region=service_region, speech_recognition_language="es-MX")
audio_config = speechsdk.audio.AudioConfig(use_default_microphone=False, filename='lala.wav')

sr = speechsdk.SpeechRecognizer(speech_config, audio_config)

es = speechsdk.EventSignal(sr.recognized, sr.recognized)

result = sr.recognize_once()

if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

Based on the documentation, looks like I have to use signals and events to capture the full audio using method start_continuous_recognition (which is not documented for python, but looks like the method and related classes are implemented).根据文档,看起来我必须使用信号和事件来使用方法 start_continuous_recognition 捕获完整的音频(python 没有记录,但看起来方法和相关类已经实现)。 I tried to follow other examples from c# and Java but was not able to implement this in Python.我尝试遵循 c# 和 Java 中的其他示例,但无法在 Python 中实现。

Has anyone been able to do this and provie some pointers?有没有人能够做到这一点并提供一些指示? Thank you very much!非常感谢!

You could try this:你可以试试这个:

import azure.cognitiveservices.speech as speechsdk
import time
speech_key, service_region = "xyz", "WestEurope"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region, speech_recognition_language="it-IT")
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
speech_recognizer.session_stopped.connect(lambda evt: print('\nSESSION STOPPED {}'.format(evt)))
speech_recognizer.recognized.connect(lambda evt: print('\n{}'.format(evt.result.text)))

print('Say a few words\n\n')
speech_recognizer.start_continuous_recognition()
time.sleep(10)
speech_recognizer.stop_continuous_recognition()

speech_recognizer.session_started.disconnect_all()
speech_recognizer.recognized.disconnect_all()
speech_recognizer.session_stopped.disconnect_all()

Remember to set your preferred language.请记住设置您的首选语言。 It's not too much but it's a good starting point, and it works.这不是太多,但这是一个很好的起点,而且很有效。 I will continue experimenting.我会继续试验。

Check the Azure python sample: https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py检查 Azure python 示例: https : //github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py

Or other language samples: https://github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples或其他语言示例: https : //github.com/Azure-Samples/cognitive-services-speech-sdk/tree/master/samples

Basically, the below:基本上,以下内容:

def speech_recognize_continuous_from_file():
    """performs continuous speech recognition with input from an audio file"""
    # <SpeechContinuousRecognitionWithFile>
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    done = False

    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    # Connect callbacks to the events fired by the speech recognizer
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    # Start continuous speech recognition
    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)
    # </SpeechContinuousRecognitionWithFile>

And to further improve @manyways solutions here own to collect the data.并进一步改进这里的@manyways 解决方案来收集数据。

all_results = []

def handle_final_result(evt):
    all_results.append(evt.result.text)
    speech_recognizer.recognized.connect(handle_final_result)  # to collect data at the end

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure Speech-To-Text 多语音识别 - Azure Speech-To-Text multiple voice recognition 字幕/字幕与 Microsoft Azure Python 中的语音到文本 - Subtitles/captions with Microsoft Azure Speech-to-text in Python 编辑 Azure Python 代码以清理 Speech-to-Text 输出 - Edit Azure Python code to clean up Speech-to-Text output 使用 Google Speech-to-Text 进行流式语音识别会导致不正确的时间戳记录 - Streaming speech recognition with Google Speech-to-Text is leading to improperly timestamped transcripts 将 python-sounddevice.RawInputStream 生成的音频数据发送到 Google Cloud Speech-to-Text 进行异步识别 - Sending audio data generated by python-sounddevice.RawInputStream to Google Cloud Speech-to-Text for asynchronous recognition Kaldi 是否返回任何识别置信度参数,类似于 Google Speech-To-Text API? - Does Kaldi return any recognition confidence parameter, similar to Google Speech-To-Text API? IBM Speech-To-Text 的输出 - Output of IBM Speech-To-Text Azure 可以使用 Speech-To-Text SDK 将 base64 编码的音频文件转换为文本吗? - Can Azure turn base64 encoded audio file to text using the Speech-To-Text SDK? Python 语音转文本和语音识别 - Python Speech to text and speech recognition IBM 语音转文本用户名和密码问题 - IBM speech-to-text username and password issue
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM