简体   繁体   中英

How to change Azure text to speech silence timeout in JavaScript

I'm using Azure SpeechSDK services for speech-to-text transcription using recognizeOnceAsync . The current code resembles:

var SpeechSDK, recognizer, synthesizer;
var speechConfig = SpeechSDK.SpeechConfig.fromSubscription('SUB_KEY', 'SUB_REGION');
var audioConfig  = SpeechSDK.AudioConfig.fromDefaultMicrophoneInput();
recognizer = new SpeechSDK.SpeechRecognizer(speechConfig, audioConfig);
new Promise(function(resolve) {
    recognizer.onend = resolve;
    recognizer.recognizeOnceAsync(
        function (result) {
            recognizer.close();
            recognizer = undefined;
            resolve(result.text);
        },
        function (err) {
            alert(err);
            recognizer.close();
            recognizer = undefined;
        }
    );
}).then(r => {
    console.log(`Azure STT enterpreted: ${r}`);
});

In the HTML file I import the Azure package like so:

<script src="https://aka.ms/csspeech/jsbrowserpackageraw"></script>

The issue is that I would like to increase the amount of "Silence time" which is allowed before the recognizeOnceAsync method returns the result. (Ie you should be able to stop and take a breath without the method assuming you're done talking). Is there any way to do this with fromDefaultMicrophoneInput ? I've tried various things like:

const SILENCE_UNTIL_TIMEOUT_MS = 5000;
speechConfig.SpeechServiceConnection_EndSilenceTimeoutMs = SILENCE_UNTIL_TIMEOUT_MS;
audioConfig.setProperty("Speech_SegmentationSilenceTimeoutMs", SILENCE_UNTIL_TIMEOUT_MS);

but none seem to extend the "silence time allowance" correctly.

This is the resource which I have been looking at: https://docs.microsoft.com/en-us/javascript/api/microsoft-cognitiveservices-speech-sdk/propertyid?view=azure-node-latest

Based on what you're describing, you'd need to set the segmentation silence timeout. Unfortunately, there is a bug in the JS SDK at the moment and the PropertyId.Speech_SegmentationSilenceTimeoutMs is not being set correctly.

As a workaround, you can instead set the segmentation timeout as follows:

const speechConfig = SpeechConfig.fromSubscription(subscriptionKey, subscriptionRegion);
speechConfig.speechRecognitionLanguage = "en-US";

const reco = new SpeechRecognizer(speechConfig);
const conn = Connection.fromRecognizer(reco);
conn.setMessageProperty("speech.context", "phraseDetection", {
    "INTERACTIVE": {
        "segmentation": {
            "mode": "custom",
            "segmentationSilenceTimeoutMs": 5000
        }
    },
    mode: "Interactive"
});

reco.recognizeOnceAsync(
    (result) =>
    {
        console.log("Recognition done!!!");
        // do something with the recognition
    },
    (error) =>
    {
        console.log("Recognition failed. Error:" + error);
    });

Please note that the allowed range for the segmentation timeout is 100-5000 ms (inclusive)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM