將AudioKit麥克風連接到Google語音轉文本

Question

我正試圖讓AudioKit將麥克風傳輸到Google的Speech-to-Text API，如圖所示，但我不完全確定如何去做。

要為Speech-to-Text引擎准備音頻，您需要設置編碼並將其作為塊傳遞。 在Google使用的示例中，他們使用Apple的AVFoundation ，但我想使用AudioKit，因此我可以執行一些預處理，例如削減低振幅等。

我認為正確的方法是使用Tap ：

首先，我應該通過以下方式匹配格式：

var asbd = AudioStreamBasicDescription()
asbd.mSampleRate = 16000.0
asbd.mFormatID = kAudioFormatLinearPCM
asbd.mFormatFlags = kAudioFormatFlagIsSignedInteger | kAudioFormatFlagIsPacked
asbd.mBytesPerPacket = 2
asbd.mFramesPerPacket = 1
asbd.mBytesPerFrame = 2
asbd.mChannelsPerFrame = 1
asbd.mBitsPerChannel = 16

AudioKit.format = AVAudioFormat(streamDescription: &asbd)!

然后創建一個水龍頭，例如：

open class TestTap {
    internal let bufferSize: UInt32 = 1_024

    @objc public init(_ input: AKNode?) {
        input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in

         // do work here

        }
    }
}

但我無法確定通過方法streamAudioData與AudioKit實時發送到Google Speech-to-Text API的正確處理數據的正確方法，但也許我會以錯誤的方式解決這個問題？

更新：

我已經創建了一個Tap如下：

open class TestTap {

    internal var audioData =  NSMutableData()
    internal let bufferSize: UInt32 = 1_024

    func toData(buffer: AVAudioPCMBuffer) -> NSData {
        let channelCount = 2  // given PCMBuffer channel count is
        let channels = UnsafeBufferPointer(start: buffer.floatChannelData, count: channelCount)
        return NSData(bytes: channels[0], length:Int(buffer.frameCapacity * buffer.format.streamDescription.pointee.mBytesPerFrame))
    }

    @objc public init(_ input: AKNode?) {

        input?.avAudioNode.installTap(onBus: 0, bufferSize: bufferSize, format: AudioKit.format) { buffer, _ in
            self.audioData.append(self.toData(buffer: buffer) as Data)

            // We recommend sending samples in 100ms chunks (from Google)
            let chunkSize: Int /* bytes/chunk */ = Int(0.1 /* seconds/chunk */
                * AudioKit.format.sampleRate /* samples/second */
                * 2 /* bytes/sample */ )

            if self.audioData.length > chunkSize {
                SpeechRecognitionService
                    .sharedInstance
                    .streamAudioData(self.audioData,
                                     completion: { response, error in
                                        if let error = error {
                                            print("ERROR: \(error.localizedDescription)")
                                            SpeechRecognitionService.sharedInstance.stopStreaming()
                                        } else if let response = response {
                                            print(response)
                                        }
                    })
                self.audioData = NSMutableData()
            }

        }
    }
}

並在viewDidLoad: ，我正在設置AudioKit：

AKSettings.sampleRate = 16_000
AKSettings.bufferLength = .shortest

然而，谷歌抱怨：

ERROR: Audio data is being streamed too fast. Please stream audio data approximately at real time.

我已經嘗試更改多個參數，如塊大小無濟於事。

Answer 1

我在這里找到了解決方案。

我的Tap最終代碼是：

open class GoogleSpeechToTextStreamingTap {

internal var converter: AVAudioConverter!

@objc public init(_ input: AKNode?, sampleRate: Double = 16000.0) {

    let format = AVAudioFormat(commonFormat: AVAudioCommonFormat.pcmFormatInt16, sampleRate: sampleRate, channels: 1, interleaved: false)!

    self.converter = AVAudioConverter(from: AudioKit.format, to: format)
    self.converter?.sampleRateConverterAlgorithm = AVSampleRateConverterAlgorithm_Normal
    self.converter?.sampleRateConverterQuality = .max

    let sampleRateRatio = AKSettings.sampleRate / sampleRate
    let inputBufferSize = 4410 //  100ms of 44.1K = 4410 samples.

    input?.avAudioNode.installTap(onBus: 0, bufferSize: AVAudioFrameCount(inputBufferSize), format: nil) { buffer, time in

        let capacity = Int(Double(buffer.frameCapacity) / sampleRateRatio)
        let bufferPCM16 = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: AVAudioFrameCount(capacity))!

        var error: NSError? = nil
        self.converter?.convert(to: bufferPCM16, error: &error) { inNumPackets, outStatus in
            outStatus.pointee = AVAudioConverterInputStatus.haveData
            return buffer
        }

        let channel = UnsafeBufferPointer(start: bufferPCM16.int16ChannelData!, count: 1)
        let data = Data(bytes: channel[0], count: capacity * 2)

        SpeechRecognitionService
            .sharedInstance
            .streamAudioData(data,
                             completion: { response, error in
                                if let error = error {
                                    print("ERROR: \(error.localizedDescription)")
                                    SpeechRecognitionService.sharedInstance.stopStreaming()
                                } else if let response = response {
                                    print(response)
                                }
            })
    }
}

Answer 2

您可以使用AKNodeRecorder進行記錄，並將緩沖區從生成的AKAudioFile傳遞到API。 如果您想要更多實時，可以嘗試在要記錄的AKNode的avAudioNode屬性上安裝一個tap，並將緩沖區連續傳遞給API。

但是，我很好奇為什么你看到需要進行預處理 - 我確信Google API已經針對您提到的示例代碼生成的錄制進行了大量優化。

我使用iOS Speech API獲得了很多成功/樂趣。 不確定您是否有理由使用Google API，但我會考慮檢查它，看看它是否可以更好地滿足您的需求，如果您還沒有。

希望這可以幫助！

將AudioKit麥克風連接到Google語音轉文本

問題描述

2 個解決方案

解決方案1
5 2017-12-01 19:00:21

解決方案2
3 2017-12-01 16:22:45

將AudioKit麥克風連接到Google語音轉文本

問題描述

2 個解決方案

解決方案1 5 2017-12-01 19:00:21

解決方案2 3 2017-12-01 16:22:45

解決方案1
5 2017-12-01 19:00:21

解決方案2
3 2017-12-01 16:22:45