I am trying to use modified event based synthesis code sample provided on azure documentation for speech to speech translation. However, during the process, I am also want to identify speakers (speaker1, speaker2) but I don't see an function in Python SDK which will help me in identifying the speakers as a part of speech=to-text translation. Can someone suggest ways to identify speaker during speech-to-text translaion? Below is the code snippet:
def translate_speech_to_text():
translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)
translation_config.speech_recognition_language = from_language
translation_config.add_target_language(to_language)
translation_config.voice_name = "en-GB-Susan"
translation_config.request_word_level_timestamps()
translation_config.output_format = speechsdk.OutputFormat(0)
audio_input = speechsdk.AudioConfig(filename=filename)
recognizer = speechsdk.translation.TranslationRecognizer(translation_config = translation_config, audio_config = audio_input)
done = False
def stop_cb(evt):
"""callback that stops continuous recognition upon receiving an event `evt`"""
#print('CLOSING on {}'.format(evt))
recognizer.stop_continuous_recognition()
nonlocal done
done = True
all_results = []
def handle_final_result(evt):
#all_results.append(evt.result.text)
#all_results.append(evt.result.translations['en'])
all_results.append(evt.result.json)
recognizer.recognized.connect(handle_final_result)
# Connect callbacks to the events fired by the speech recognizer
recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
#recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
recognizer.session_stopped.connect(stop_cb)
recognizer.canceled.connect(stop_cb)
def synthesis_callback(evt):
print('Audio: {}'.format(len(evt.result.audio)))
print('Reason: {}'.format(evt.result.reason))
with open('out.wav', 'wb') as wavfile:
wavfile.write(evt.result.audio)
recognizer.synthesizing.connect(synthesis_callback)
recognizer.start_continuous_recognition()
while not done:
time.sleep(.5)
print("Printing all results:")
print(all_results)
translate_speech_to_text()
If you want to identify speaker, you should use Speech Service
.
Text Independent - Identify Single Speaker
Speech Services
has a complete SDK in C#
, C++
, JavaScript
, REST
, and can perform Speaker Recognition
. (I searched the Python SDK
, but I haven't found a method that can be used directly to identify it.)
1. It is recommended to read the Speech related documents carefully and how to use this service.
2. It is recommended to use request to send http post requests.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.