[英]SFSpeechRecognizer - detect end of utterance
I am hacking a little project using iOS 10 built-in speech recognition.我正在使用 iOS 10 内置语音识别来破解一个小项目。 I have working results using device's microphone, my speech is recognized very accurately.
我使用设备的麦克风有工作结果,我的语音被非常准确地识别。
My problem is that recognition task callback is called for every available partial transcription, and I want it to detect person stopped talking and call the callback with isFinal
property set to true.我的问题是每个可用的部分转录都会调用识别任务回调,我希望它检测到人停止说话并调用回调,
isFinal
属性设置为 true。 It is not happening - app is listening indefinitely.它没有发生 - 应用程序正在无限期地收听。
Is SFSpeechRecognizer
ever capable of detecting end of sentence? SFSpeechRecognizer
是否能够检测到句子的结尾?
Here's my code - it is based on example found on the Internets, it is mostly a boilerplate needed to recognize from microphone source.这是我的代码 - 它基于互联网上的示例,它主要是从麦克风源识别所需的样板。 I modified it by adding recognition
taskHint
.我通过添加识别
taskHint
对其进行了修改。 I also set shouldReportPartialResults
to false, but it seems it has been ignored.我还将
shouldReportPartialResults
设置为 false,但似乎已被忽略。
func startRecording() {
if recognitionTask != nil {
recognitionTask?.cancel()
recognitionTask = nil
}
let audioSession = AVAudioSession.sharedInstance()
do {
try audioSession.setCategory(AVAudioSessionCategoryRecord)
try audioSession.setMode(AVAudioSessionModeMeasurement)
try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
} catch {
print("audioSession properties weren't set because of an error.")
}
recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
recognitionRequest?.shouldReportPartialResults = false
recognitionRequest?.taskHint = .search
guard let inputNode = audioEngine.inputNode else {
fatalError("Audio engine has no input node")
}
guard let recognitionRequest = recognitionRequest else {
fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
}
recognitionRequest.shouldReportPartialResults = true
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if result != nil {
print("RECOGNIZED \(result?.bestTranscription.formattedString)")
self.transcriptLabel.text = result?.bestTranscription.formattedString
isFinal = (result?.isFinal)!
}
if error != nil || isFinal {
self.state = .Idle
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
self.micButton.isEnabled = true
self.say(text: "OK. Let me see.")
}
})
let recordingFormat = inputNode.outputFormat(forBus: 0)
inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
self.recognitionRequest?.append(buffer)
}
audioEngine.prepare()
do {
try audioEngine.start()
} catch {
print("audioEngine couldn't start because of an error.")
}
transcriptLabel.text = "Say something, I'm listening!"
state = .Listening
}
It seems that isFinal flag doesn't became true when user stops talking as expected.当用户按预期停止说话时, isFinal标志似乎没有变为真。 I guess this is a wanted behaviour by Apple, because the event "User stops talking" is an undefined event.
我想这是 Apple 想要的行为,因为“用户停止说话”事件是一个未定义的事件。
I believe that the easiest way to achieve your goal is to do the following:我相信实现目标的最简单方法是执行以下操作:
You have to estabilish an "interval of silence".你必须建立一个“沉默的间隔”。 That means if the user doesn't talk for a time greater than your interval, he has stopped talking (ie 2 seconds).
这意味着如果用户没有说话的时间超过您的间隔时间,则他已经停止说话(即 2 秒)。
Create a Timer at the beginning of the audio session
:在
audio session
开始时创建一个计时器:
var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
when you get new transcriptions in recognitionTask
invalidate and restart your timer当您在
recognitionTask
中获得新的转录时任务无效并重新启动您的计时器
timer.invalidate() timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)
if the timer expires this means the user doesn't talk from 2 seconds.如果计时器到期,这意味着用户在 2 秒内没有说话。 You can safely stop Audio Session and exit
您可以安全地停止音频会话并退出
根据我在 iOS10 上的测试,当 shouldReportPartialResults 设置为 false 时,需要等待 60 秒才能得到结果。
I am using Speech to text in an app currently and it is working fine for me.我目前正在一个应用程序中使用 Speech to text,它对我来说运行良好。 My recognitionTask block is as follows:
我的识别任务块如下:
recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
var isFinal = false
if let result = result, result.isFinal {
print("Result: \(result.bestTranscription.formattedString)")
isFinal = result.isFinal
completion(result.bestTranscription.formattedString, nil)
}
if error != nil || isFinal {
self.audioEngine.stop()
inputNode.removeTap(onBus: 0)
self.recognitionRequest = nil
self.recognitionTask = nil
completion(nil, error)
}
})
if result != nil {
self.timerDidFinishTalk.invalidate()
self.timerDidFinishTalk = Timer.scheduledTimer(timeInterval: TimeInterval(self.listeningTime), target: self, selector:#selector(self.didFinishTalk), userInfo: nil, repeats: false)
let bestString = result?.bestTranscription.formattedString
self.fullsTring = bestString!.trimmingCharacters(in: .whitespaces)
self.st = self.fullsTring
}
Here self.listeningTime
is the time after which you want to stop after getting end of the utterance.这里的
self.listeningTime
是你想要在话语结束后停止的时间。
I have a different approach that I find far more reliable in determining when the recognitionTask is done guessing: the confidence
score.我有一种不同的方法,我发现在确定识别任务何时完成猜测时更可靠:
confidence
分数。
When shouldReportPartialResults
is set to true, the partial results will have a confidence score of 0.0
.当
shouldReportPartialResults
设置为 true 时,部分结果的置信度分数将为0.0
。 Only the final guess will come back with a score over 0.只有最终的猜测会以超过 0 的分数返回。
recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in
if let result = result {
let confidence = result.bestTranscription.segments[0].confidence
print(confidence)
self.transcript = result.bestTranscription.formattedString
}
}
The segments
array above contains each word in the transcription.上面的
segments
数组包含转录中的每个单词。 0
is the safest index to examine, so I tend to use that one. 0
是最安全的检查索引,所以我倾向于使用那个。
How you use it is up to you, but if all you want to do is know when the guesser is done guessing, you can just call:你如何使用它取决于你,但如果你只想知道猜测者何时完成猜测,你可以调用:
let myIsFinal = confidence > 0.0 ? true : false
You can also look at the score (100.0 is totally confident) and group responses into groups of low -> high confidence guesses as well if that helps your application.您还可以查看分数(100.0 完全有信心)并将响应分组为低 -> 高置信度猜测组,如果这有助于您的应用程序。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.