简体   繁体   中英

AVAudioSession issue when using SFSpeechRecognizer after AVSpeechUtterance

am trying to use SFSpeechRecognizer for speech to text, after speaking a welcome message to the user via AVSpeechUtterance. But randomly, the speech recognition does not start(after speaking the welcome message) and it throws the error message below.

[avas] ERROR: AVAudioSession.mm:1049: -[AVAudioSession setActive:withOptions:error:]: Deactivating an audio session that has running I/O. All I/O should be stopped or paused prior to deactivating the audio session.

It works few times. Am not clear on why is it not working consistently.

I tried the solutions mentioned in other SO posts, where it mentions to check if there are audio players running. I added that check in the speech to text part of the code. It returns false (ie no other audio player is running) But still the speech to text does not start listening for the user speech. Can you pls guide me on what is going wrong.

Am testing on iPhone 6 running iOS 10.3

Below are code snippets used:

TextToSpeech :

- (void) speak:(NSString *) textToSpeak {
    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryPlayback
      withOptions:AVAudioSessionCategoryOptionDuckOthers error:nil];

    [synthesizer stopSpeakingAtBoundary:AVSpeechBoundaryImmediate];

    AVSpeechUtterance* utterance = [[AVSpeechUtterance new] initWithString:textToSpeak];
    utterance.voice = [AVSpeechSynthesisVoice voiceWithLanguage:locale];
    utterance.rate = (AVSpeechUtteranceMinimumSpeechRate * 1.5 + AVSpeechUtteranceDefaultSpeechRate) / 2.5 * rate * rate;
    utterance.pitchMultiplier = 1.2;
    [synthesizer speakUtterance:utterance];
}

- (void)speechSynthesizer:(AVSpeechSynthesizer*)synthesizer didFinishSpeechUtterance:(AVSpeechUtterance*)utterance {
    //Return success message back to caller

    [[AVAudioSession sharedInstance] setActive:NO withOptions:0 error:nil];
    [[AVAudioSession sharedInstance] setCategory:AVAudioSessionCategoryAmbient
      withOptions: 0 error: nil];
    [[AVAudioSession sharedInstance] setActive:YES withOptions: 0 error:nil];
}

Speech To Text :

- (void) recordUserSpeech:(NSString *) lang {
    NSLocale *locale = [[NSLocale alloc] initWithLocaleIdentifier:lang];
    self.sfSpeechRecognizer = [[SFSpeechRecognizer alloc] initWithLocale:locale];
    [self.sfSpeechRecognizer setDelegate:self];

    NSLog(@"Step1: ");
    // Cancel the previous task if it's running.
    if ( self.recognitionTask ) {
        NSLog(@"Step2: ");
        [self.recognitionTask cancel];
        self.recognitionTask = nil;
    }

    NSLog(@"Step3: ");
    [self initAudioSession];

    self.recognitionRequest = [[SFSpeechAudioBufferRecognitionRequest alloc] init];
    NSLog(@"Step4: ");

    if (!self.audioEngine.inputNode) {
        NSLog(@"Audio engine has no input node");
    }

    if (!self.recognitionRequest) {
        NSLog(@"Unable to created a SFSpeechAudioBufferRecognitionRequest object");
    }

    self.recognitionTask = [self.sfSpeechRecognizer recognitionTaskWithRequest:self.recognitionRequest resultHandler:^(SFSpeechRecognitionResult *result, NSError *error) {

        bool isFinal= false;

        if (error) {
            [self stopAndRelease];
            NSLog(@"In recognitionTaskWithRequest.. Error code ::: %ld, %@", (long)error.code, error.description);
            [self sendErrorWithMessage:error.localizedFailureReason andCode:error.code];
        }

        if (result) {

            [self sendResults:result.bestTranscription.formattedString];
            isFinal = result.isFinal;
        }

        if (isFinal) {
            NSLog(@"result.isFinal: ");
            [self stopAndRelease];
            //return control to caller
        }
    }];

    NSLog(@"Step5: ");

    AVAudioFormat *recordingFormat = [self.audioEngine.inputNode outputFormatForBus:0];

    [self.audioEngine.inputNode installTapOnBus:0 bufferSize:1024 format:recordingFormat block:^(AVAudioPCMBuffer * _Nonnull buffer, AVAudioTime * _Nonnull when) {
        //NSLog(@"Installing Audio engine: ");
        [self.recognitionRequest appendAudioPCMBuffer:buffer];
    }];

    NSLog(@"Step6: ");

    [self.audioEngine prepare];
    NSLog(@"Step7: ");
    NSError *err;
    [self.audioEngine startAndReturnError:&err];
}
- (void) initAudioSession
{
    AVAudioSession *audioSession = [AVAudioSession sharedInstance];
    [audioSession setCategory:AVAudioSessionCategoryRecord error:nil];
    [audioSession setMode:AVAudioSessionModeMeasurement error:nil];
    [audioSession setActive:YES withOptions:AVAudioSessionSetActiveOptionNotifyOthersOnDeactivation error:nil];
}

-(void) stopAndRelease
{
    NSLog(@"Invoking SFSpeechRecognizer stopAndRelease: ");
    [self.audioEngine stop];
    [self.recognitionRequest endAudio];
    [self.audioEngine.inputNode removeTapOnBus:0];
    self.recognitionRequest = nil;
    [self.recognitionTask cancel];
    self.recognitionTask = nil;
}

Regarding the logs added, am able to see all logs till "Step7" printed.

When debugging the code in the device, it consistently triggers break at the below lines (I have exception breakpoints set) though, continue keeps on with the execution. It however happens same way during few successful executions as well.

AVAudioFormat *recordingFormat = [self.audioEngine.inputNode outputFormatForBus:0];

[self.audioEngine prepare];

The reason is audio didn't completely finish, when -speechSynthesizer:didFinishSpeechUtterance: was called, therefore you get such kind of error trying to call setActive:NO . You cant deactivate AudioSession or change any settings during I/O is running. Workaround: wait for several ms (how long read below) and then perform AudioSession deactivation and stuff.

A few words about audio playing completion.

That might seem weird at first glance, but I've spent tones of time to research this issue. When you put last sound chunk to device output you have only approximate timing when it actually will be completed. Look at the AudioSession property ioBufferDuration :

The audio I/O buffer duration is the number of seconds for a single audio input/output cycle. For example, with an I/O buffer duration of 0.005 s, on each audio I/O cycle:

  • You receive 0.005 s of audio if obtaining input.
  • You must provide 0.005 s of audio if providing output.

The typical maximum I/O buffer duration is 0.93 s (corresponding to 4096 sample frames at a sample rate of 44.1 kHz). The minimum I/O buffer duration is at least 0.005 s (256 frames) but might be lower depending on the hardware in use.

So, we can interpret this value as the one chunk playback time. But you still have a small non-calculated duration between this timeline and actual audio playing completion (hardware delay). I would say you need wait about ioBufferDuration * 1000 + delay ms for being sure audio playing complete ( ioBufferDuration * 1000 - coz it is duration in seconds ), where delay is some quite small value.

More over seems like even Apple developers are also not pretty sure about audio completion time. Quick look at the new audio class AVAudioPlayerNode and func scheduleBuffer(_ buffer: AVAudioPCMBuffer, completionHandler: AVFoundation.AVAudioNodeCompletionHandler? = nil) :

@param completionHandler called after the buffer has been consumed by the player or the player is stopped. may be nil.

@discussion Schedules the buffer to be played following any previously scheduled commands. It is possible for the completionHandler to be called before rendering begins or before the buffer is played completely .

You can read more about audio processing in Understanding the Audio Unit Render Callback Function ( AudioUnit is low-level API that provides fasten access to I/O data).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM