A use case for my app is to convert speech (single word utterances) to text. I need to use Azure speech to text for this. Sometimes the speech needs to be converted into an integer - I need to submit the response as a quantity for example. My question is is there anyway, via the REST API, to tell the speech to text service I want a numeric result? Currently it is returning things like 'one' instead of '1' and 'free' instead of '3'. I don't think there is a way to do this from the documentation but I wanted to see if anyone else has solved this problem before I think of a way around it. This is the code I am using in my proof of concept project:
public static async Task SpeechToTextAsync(MemoryStream data, ISpeechResultCallback callBack)
{
string accessToken = await Authentication.GetAccessToken();
IToast toastWrapper = DependencyService.Get<IToast>();
if (accessToken != null)
{
toastWrapper.Show("Acquired token");
callBack.SpeechReturned("Acquired token");
using (var client = new HttpClient())
{
HttpWebRequest request = (HttpWebRequest)HttpWebRequest.Create("https://westus.stt.speech.microsoft.com/speech/recognition/conversation/cognitiveservices/v1?language=en-GB&format=detailed");
request.SendChunked = true;
request.Accept = @"application/json;text/xml";
request.Method = "POST";
request.ProtocolVersion = HttpVersion.Version11;
request.Host = "westus.stt.speech.microsoft.com";
request.ContentType = @"audio/wav; codecs=audio/pcm; samplerate=16000";
// request.Headers["Ocp-Apim-Subscription-Key"] = Program.SubscriptionKey;
request.Headers.Add("Authorization", "Bearer " + accessToken);
request.AllowWriteStreamBuffering = false;
data.Position = 0;
byte[] buffer = null;
int bytesRead = 0;
using (Stream requestStream = request.GetRequestStream())
{
buffer = new Byte[checked((uint)Math.Min(1024, (int)data.Length))];
while ((bytesRead = data.Read(buffer, 0, buffer.Length)) != 0)
{
requestStream.Write(buffer, 0, bytesRead);
}
// Flush
requestStream.Flush();
}
try
{
string responseData = null;
using (WebResponse response = request.GetResponse())
{
var encoding = Encoding.GetEncoding(((HttpWebResponse)response).CharacterSet);
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, encoding))
{
responseData = reader.ReadToEnd();
AzureSTTResults deserializedProduct = JsonConvert.DeserializeObject<AzureSTTResults>(responseData);
if(deserializedProduct == null || deserializedProduct.NBest == null || deserializedProduct.NBest.Length == 0)
{
toastWrapper.Show("No results");
callBack.SpeechReturned("No results");
}
else
{
toastWrapper.Show(deserializedProduct.NBest[0].ITN);
callBack.SpeechReturned(deserializedProduct.NBest[0].ITN);
}
}
}
}
}
catch (Exception ex)
{
toastWrapper.Show(ex.Message);
callBack.SpeechReturned(ex.Message);
}
}
}
else
{
toastWrapper.Show("No token required");
callBack.SpeechReturned("No token required");
}
}
And here is an example of the result that I would like to be '1':
{
"RecognitionStatus": "Success",
"Offset": 0,
"Duration": 22200000,
"NBest": [
{
"Confidence": 0.43084684014320374,
"Lexical": "one",
"ITN": "One",
"MaskedITN": "One",
"Display": "One."
}
]
}
According to the offical document Speech-to-text REST API
, there is no option can help converting the numberic words to numbers.
Considering for the numberic words in English have the pattern in syntax, you can use a simple algorithm to implement the feature for converting words to numbers. As references, you can follow these below to write your own one in C# by yourself.
Hope it helps.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.