简体   繁体   English

SFSpeechRecognizer - 检测话语结束

[英]SFSpeechRecognizer - detect end of utterance

I am hacking a little project using iOS 10 built-in speech recognition.我正在使用 iOS 10 内置语音识别来破解一个小项目。 I have working results using device's microphone, my speech is recognized very accurately.我使用设备的麦克风有工作结果,我的语音被非常准确地识别。

My problem is that recognition task callback is called for every available partial transcription, and I want it to detect person stopped talking and call the callback with isFinal property set to true.我的问题是每个可用的部分转录都会调用识别任务回调,我希望它检测到人停止说话并调用回调, isFinal属性设置为 true。 It is not happening - app is listening indefinitely.它没有发生 - 应用程序正在无限期地收听。

Is SFSpeechRecognizer ever capable of detecting end of sentence? SFSpeechRecognizer是否能够检测到句子的结尾?

Here's my code - it is based on example found on the Internets, it is mostly a boilerplate needed to recognize from microphone source.这是我的代码 - 它基于互联网上的示例,它主要是从麦克风源识别所需的样板。 I modified it by adding recognition taskHint .我通过添加识别taskHint对其进行了修改。 I also set shouldReportPartialResults to false, but it seems it has been ignored.我还将shouldReportPartialResults设置为 false,但似乎已被忽略。

    func startRecording() {

    if recognitionTask != nil {
        recognitionTask?.cancel()
        recognitionTask = nil
    }

    let audioSession = AVAudioSession.sharedInstance()
    do {
        try audioSession.setCategory(AVAudioSessionCategoryRecord)
        try audioSession.setMode(AVAudioSessionModeMeasurement)
        try audioSession.setActive(true, with: .notifyOthersOnDeactivation)
    } catch {
        print("audioSession properties weren't set because of an error.")
    }

    recognitionRequest = SFSpeechAudioBufferRecognitionRequest()
    recognitionRequest?.shouldReportPartialResults = false
    recognitionRequest?.taskHint = .search

    guard let inputNode = audioEngine.inputNode else {
        fatalError("Audio engine has no input node")
    }

    guard let recognitionRequest = recognitionRequest else {
        fatalError("Unable to create an SFSpeechAudioBufferRecognitionRequest object")
    }

    recognitionRequest.shouldReportPartialResults = true

    recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in

        var isFinal = false

        if result != nil {
            print("RECOGNIZED \(result?.bestTranscription.formattedString)")
            self.transcriptLabel.text = result?.bestTranscription.formattedString
            isFinal = (result?.isFinal)!
        }

        if error != nil || isFinal {
            self.state = .Idle

            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil

            self.micButton.isEnabled = true

            self.say(text: "OK. Let me see.")
        }
    })

    let recordingFormat = inputNode.outputFormat(forBus: 0)
    inputNode.installTap(onBus: 0, bufferSize: 1024, format: recordingFormat) { (buffer, when) in
        self.recognitionRequest?.append(buffer)
    }

    audioEngine.prepare()

    do {
        try audioEngine.start()
    } catch {
        print("audioEngine couldn't start because of an error.")
    }

    transcriptLabel.text = "Say something, I'm listening!"

    state = .Listening
}

It seems that isFinal flag doesn't became true when user stops talking as expected.当用户按预期停止说话时, isFinal标志似乎没有变为真。 I guess this is a wanted behaviour by Apple, because the event "User stops talking" is an undefined event.我想这是 Apple 想要的行为,因为“用户停止说话”事件是一个未定义的事件。

I believe that the easiest way to achieve your goal is to do the following:我相信实现目标的最简单方法是执行以下操作:

  • You have to estabilish an "interval of silence".你必须建立一个“沉默的间隔”。 That means if the user doesn't talk for a time greater than your interval, he has stopped talking (ie 2 seconds).这意味着如果用户没有说话的时间超过您的间隔时间,则他已经停止说话(即 2 秒)。

  • Create a Timer at the beginning of the audio session :audio session开始时创建一个计时器

var timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)

  • when you get new transcriptions in recognitionTask invalidate and restart your timer当您在recognitionTask中获得新的转录时任务无效并重新启动您的计时器

    timer.invalidate() timer = NSTimer.scheduledTimerWithTimeInterval(2, target: self, selector: "didFinishTalk", userInfo: nil, repeats: false)

  • if the timer expires this means the user doesn't talk from 2 seconds.如果计时器到期,这意味着用户在 2 秒内没有说话。 You can safely stop Audio Session and exit您可以安全地停止音频会话并退出

根据我在 iOS10 上的测试,当 shouldReportPartialResults 设置为 false 时,需要等待 60 秒才能得到结果。

I am using Speech to text in an app currently and it is working fine for me.我目前正在一个应用程序中使用 Speech to text,它对我来说运行良好。 My recognitionTask block is as follows:我的识别任务块如下:

recognitionTask = speechRecognizer?.recognitionTask(with: recognitionRequest, resultHandler: { (result, error) in
        var isFinal = false

        if let result = result, result.isFinal {
            print("Result: \(result.bestTranscription.formattedString)")
            isFinal = result.isFinal
            completion(result.bestTranscription.formattedString, nil)
        }

        if error != nil || isFinal {
            self.audioEngine.stop()
            inputNode.removeTap(onBus: 0)

            self.recognitionRequest = nil
            self.recognitionTask = nil
            completion(nil, error)
        }
    })
if result != nil {
    self.timerDidFinishTalk.invalidate()
    self.timerDidFinishTalk = Timer.scheduledTimer(timeInterval: TimeInterval(self.listeningTime), target: self, selector:#selector(self.didFinishTalk), userInfo: nil, repeats: false)

    let bestString = result?.bestTranscription.formattedString

    self.fullsTring =  bestString!.trimmingCharacters(in: .whitespaces)
    self.st = self.fullsTring
  }

Here self.listeningTime is the time after which you want to stop after getting end of the utterance.这里的self.listeningTime是你想要在话语结束后停止的时间。

I have a different approach that I find far more reliable in determining when the recognitionTask is done guessing: the confidence score.我有一种不同的方法,我发现在确定识别任务何时完成猜测时更可靠: confidence分数。

When shouldReportPartialResults is set to true, the partial results will have a confidence score of 0.0 .shouldReportPartialResults设置为 true 时,部分结果的置信度分数将为0.0 Only the final guess will come back with a score over 0.只有最终的猜测会以超过 0 的分数返回。

recognitionTask = speechRecognizer.recognitionTask(with: recognitionRequest) { result, error in

    if let result = result {
        let confidence = result.bestTranscription.segments[0].confidence
        print(confidence)
        self.transcript = result.bestTranscription.formattedString
    }

}

The segments array above contains each word in the transcription.上面的segments数组包含转录中的每个单词。 0 is the safest index to examine, so I tend to use that one. 0是最安全的检查索引,所以我倾向于使用那个。

How you use it is up to you, but if all you want to do is know when the guesser is done guessing, you can just call:你如何使用它取决于你,但如果你只想知道猜测者何时完成猜测,你可以调用:

let myIsFinal = confidence > 0.0 ? true : false

You can also look at the score (100.0 is totally confident) and group responses into groups of low -> high confidence guesses as well if that helps your application.您还可以查看分数(100.0 完全有信心)并将响应分组为低 -> 高置信度猜测组,如果这有助于您的应用程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM