I'm using azure speech to text to find timestamps of utterances in a wav file.
The problem I'm encountering is that if the user has recorded numbers, for instance "I'm going to count to three. One, two, three, here I come". The numbers are omitted from the output. This happens both for English and other languages. I can understand utterances like 'eh' and 'ah' being omitted, but numbers? why is that the default.
I'm using:
Can I somehow configure the SpeechRecognizer differently so it also outputs numbers?
.wav
audio file to text without the loss of data. string speechKey = "<Your_Key>";
string speechRegion = "Your_Region";
var speechConfig = SpeechConfig.FromSubscription(speechKey, speechRegion);
speechConfig.SpeechRecognitionLanguage = "en-US";
using var audioConfig = AudioConfig.FromWavFileInput("<Path to File>");
using var speechRecognizer = new SpeechRecognizer(speechConfig, audioConfig);
var speechRecognitionResult = await speechRecognizer.RecognizeOnceAsync();
Console.WriteLine(speechRecognitionResult.Text);
output:
But apparently there is a bug in the conversion model where if there is a pause between I'm going to count to three.
and One, two, three, here I come
. The model will omit the One, two, three, here I come
sentence from the audio file.
Also, I couldn't find anything in this MSDOC on audio config class to configure the audio settings regarding this issue.
I found the error my results not recognizing numbers. It was in my own code. In my postprocessing I was trying to get rid of punctuation marks from the result. Here I was also accidently getting rid of numbers.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.