How to identify speaker using python sdk in using Azure cognitive speech translation API?

Question

I am trying to use modified event based synthesis code sample provided on azure documentation for speech to speech translation. However, during the process, I am also want to identify speakers (speaker1, speaker2) but I don't see an function in Python SDK which will help me in identifying the speakers as a part of speech=to-text translation. Can someone suggest ways to identify speaker during speech-to-text translaion? Below is the code snippet:

def translate_speech_to_text():

    translation_config = speechsdk.translation.SpeechTranslationConfig(subscription=speech_key, region=service_region)
    translation_config.speech_recognition_language = from_language
    translation_config.add_target_language(to_language)
    translation_config.voice_name = "en-GB-Susan"

    translation_config.request_word_level_timestamps()
    translation_config.output_format = speechsdk.OutputFormat(0)

    audio_input = speechsdk.AudioConfig(filename=filename)
    recognizer = speechsdk.translation.TranslationRecognizer(translation_config = translation_config, audio_config = audio_input)

    done = False

    def stop_cb(evt):
        """callback that stops continuous recognition upon receiving an event `evt`"""
        #print('CLOSING on {}'.format(evt))
        recognizer.stop_continuous_recognition()
        nonlocal done
        done = True

    all_results = []
    def handle_final_result(evt):
        #all_results.append(evt.result.text)
        #all_results.append(evt.result.translations['en'])
        all_results.append(evt.result.json)
    
    recognizer.recognized.connect(handle_final_result)
    # Connect callbacks to the events fired by the speech recognizer
    recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    #recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    recognizer.session_stopped.connect(stop_cb)
    recognizer.canceled.connect(stop_cb)
    
    def synthesis_callback(evt):
        print('Audio: {}'.format(len(evt.result.audio)))
        print('Reason: {}'.format(evt.result.reason))
        with open('out.wav', 'wb') as wavfile:
            wavfile.write(evt.result.audio)
   
    recognizer.synthesizing.connect(synthesis_callback)
    recognizer.start_continuous_recognition()    

    while not done:
        time.sleep(.5)
    
    print("Printing all results:")
    print(all_results)

translate_speech_to_text()

Answer 1

If you want to identify speaker, you should use Speech Service .

Recommend to use REST API.

Text Independent - Identify Single Speaker

Speech Services has a complete SDK in C# , C++ , JavaScript , REST , and can perform Speaker Recognition . (I searched the Python SDK , but I haven't found a method that can be used directly to identify it.)

Suggestion

1. It is recommended to read the Speech related documents carefully and how to use this service.

2. It is recommended to use request to send http post requests.

How to identify speaker using python sdk in using Azure cognitive speech translation API?

Question

1 answers

solution1
0 ACCPTED 2020-12-30 01:51:50

Recommend to use REST API.

Suggestion

How to identify speaker using python sdk in using Azure cognitive speech translation API?

Question

1 answers

solution1 0 ACCPTED 2020-12-30 01:51:50

Recommend to use REST API.

Suggestion

solution1
0 ACCPTED 2020-12-30 01:51:50