简体   繁体   中英

Stream audio from videoconference to azure speech translate using python

I am using a Mac and am trying to capture Zoom audio output as input for Azure speech-to-translation model using python and Blackhole.

I have Zoom set to在此处输入图像描述

and am setting Azure translation_recognizer object to 'Multi-Output device'

在此处输入图像描述

However, when I set device_name for AutoConfig to the virtual device, I am getting a runtime error which I don't understand and cannot seem to find advice anywhere.

Stacktrace:

/usr/local/bin/python3.9 /Users/sethhammock/Scripts/translate_speech_continuous.py 
SESSION STARTED: SessionEventArgs(session_id=116958757375422f9b8f5c6a31aed3c9)
Traceback (most recent call last):
  File "/Users/sethhammock/Scripts/translate_speech_continuous.py", line 48, in <module>
    translation_recognizer.start_continuous_recognition()
  File "/usr/local/lib/python3.9/site-packages/azure/cognitiveservices/speech/speech.py", line 664, in start_continuous_recognition
    return self._impl.start_continuous_recognition_async().get()
  File "/usr/local/lib/python3.9/site-packages/azure/cognitiveservices/speech/speech_py_impl.py", line 1978, in get
    return _speech_py_impl.VoidFuture_get(self)
RuntimeError: Exception with an error code: 0x15 (SPXERR_MIC_ERROR)
[CALL STACK BEGIN]

3   libMicrosoft.CognitiveServices.Spee 0x00000001093c464e GetModuleObject + 716126
4   libMicrosoft.CognitiveServices.Spee 0x00000001094322fe GetModuleObject + 1165838
5   libMicrosoft.CognitiveServices.Spee 0x0000000109442c41 GetModuleObject + 1233745
6   libMicrosoft.CognitiveServices.Spee 0x000000010943cabc GetModuleObject + 1208780
7   libMicrosoft.CognitiveServices.Spee 0x000000010943c3c2 GetModuleObject + 1206994
8   libMicrosoft.CognitiveServices.Spee 0x0000000109439941 GetModuleObject + 1196113
9   libMicrosoft.CognitiveServices.Spee 0x0000000109546d1a _ZN13FileBlobWrite11WriteToFileEPviPKc + 671994
10  libMicrosoft.CognitiveServices.Spee 0x0000000109543e52 _ZN13FileBlobWrite11WriteToFileEPviPKc + 660018
11  libMicrosoft.CognitiveServices.Spee 0x00000001094b2f5f _ZN13FileBlobWrite11WriteToFileEPviPKc + 66367
12  libMicrosoft.CognitiveServices.Spee 0x00000001094b08c2 _ZN13FileBlobWrite11WriteToFileEPviPKc + 56482
13  libMicrosoft.CognitiveServices.Spee 0x00000001094cefb6 _ZN13FileBlobWrite11WriteToFileEPviPKc + 181142
14  libMicrosoft.CognitiveServices.Spee 0x0000000109329fc6 GetModuleObject + 83670
15  libMicrosoft.CognitiveServices.Spee 0x0000000109329f59 GetModuleObject + 83561
16  libMicrosoft.CognitiveServices.Spee 0x000000010932be2b GetModuleObject + 91451
17  libMicrosoft.CognitiveServices.Spee 0x000000010932a113 GetModuleObject + 84003
18  libMicrosoft.CognitiveServices.Spee 0x000000010932d4f8 GetModuleObject + 97288
19  libsystem_pthread.dylib             0x00007ff8035464e1 _pthread_start + 125
[CALL STACK END]



Process finished with exit code 1

I thought the runtime error was due to the sample rating of 16Hz, but with Blackhole it is easy to configure using the simple GUI.

I've discovered that the AudioConfig needs to carry an ALSA-style device_name=device_name, however, I don't think it will work on Mac OS, as trying to install alsa-lib on the cmd line says, "...this requires Linux".

The ALSA style device names are like hw:X,Y where X is the device and Y the card number if I understand correctly. ALSA works on Debian apparently, but not on BSD, which is what Mac OS is based on, so am I wasting my time trying this?

Can anyone help me to understand how I can set speechsdk.audio.AudioConfig(device_name="Blackhole 16ch") or speechsdk.audio.AudioConfig(device_name="hw:0,2") or if there is something I am missing about device naming conventions for what I am trying to achieve?

I tried using a file to read, and it works great. It simply reads in my audio file and returns the translation result.

So, naming the device cannot work due to no ALSA style naming conventions, would writing the audio stream to a file, and having Azure read it in work?

Any ideas a much appreciated!

i use these tool to check https://github.com/jimbobbennett/AudioIds

I compile the code to get my blackhoide device name

2022-09-26 14:39:03.339591+0800 AudioIds[5533:2695040] {
    deviceName = "BlackHole 16ch";
    deviceUID = "BlackHole16ch_UID";
}

I have the same application with you. I use the setting and it is work with me to transcribe.

audio_config = speechsdk.audio.AudioConfig(device_name="BlackHole16ch_UID")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM