I am hacking a little project using iOS 10 built-in speech recognition. I have working results using device's microphone, my speech is recognized very accurately.
My problem is that recognition task callback is called for every available partial transcription, and I want it to detect person stopped talking and call the callback with isFinal
property set to true. It is not happening - app is listening indefinitely.
Is SFSpeechRecognizer
ever capable of detecting end of sentence?
Here's my code - it is based on example found on the Internets, it is mostly a boilerplate needed to recognize from microphone source. I modified it by adding recognition taskHint
. I also set shouldReportPartialResults
to false, but it seems it has been ignored.
func startRecording() {
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
recognitionRequest?.shouldReportPartialResults = false
recognitionRequest?.taskHint = .search
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
print("RECOGNIZED \(result?.bestTranscription.formattedString)")
self.transcriptLabel.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.state = .Idle
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.micButton.isEnabled = true
self.say(text: "OK. Let me see.")
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
transcriptLabel.text = "Say something, I'm listening!"
state = .Listening
}
It seems that isFinal flag doesn't became true when user stops talking as expected. I guess this is a wanted behaviour by Apple, because the event "User stops talking" is an undefined event.
I believe that the easiest way to achieve your goal is to do the following:
You have to estabilish an "interval of silence". That means if the user doesn't talk for a time greater than your interval, he has stopped talking (ie 2 seconds).
Create a Timer at the beginning of the audio session
:
var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
when you get new transcriptions in recognitionTask
invalidate and restart your timer
timer.invalidate() timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
if the timer expires this means the user doesn't talk from 2 seconds. You can safely stop Audio Session and exit
根据我在 iOS10 上的测试,当 shouldReportPartialResults 设置为 false 时,需要等待 60 秒才能得到结果。
I am using Speech to text in an app currently and it is working fine for me. My recognitionTask block is as follows:
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if let result = result, result.isFinal {
print("Result: \(result.bestTranscription.formattedString)")
isFinal = result.isFinal
completion(result.bestTranscription.formattedString, nil)
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
completion(nil, error)
}
})
if result != nil {
self.timerDidFinishTalk.invalidate()
self.timerDidFinishTalk = Timer.scheduledTimer(timeInterval: TimeInterval(self.listeningTime), target: self, selector:#selector(self.didFinishTalk), userInfo: nil, repeats: false)
let bestString = result?.bestTranscription.formattedString
self.fullsTring = bestString!.trimmingCharacters(in: .whitespaces)
self.st = self.fullsTring
}
Here self.listeningTime
is the time after which you want to stop after getting end of the utterance.
I have a different approach that I find far more reliable in determining when the recognitionTask is done guessing: the confidence
score.
When shouldReportPartialResults
is set to true, the partial results will have a confidence score of 0.0
. Only the final guess will come back with a score over 0.
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
if let result = result {
let confidence = result.bestTranscription.segments[0].confidence
print(confidence)
self.transcript = result.bestTranscription.formattedString
}
}
The segments
array above contains each word in the transcription. 0
is the safest index to examine, so I tend to use that one.
How you use it is up to you, but if all you want to do is know when the guesser is done guessing, you can just call:
let myIsFinal = confidence > 0.0 ? true : false
You can also look at the score (100.0 is totally confident) and group responses into groups of low -> high confidence guesses as well if that helps your application.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.