使用AudioConverter进行AAC编码并写入AVAssetWriter

Question

I'm struggling to encode audio buffers received from AVCaptureSession using AudioConverter and then appending them to an AVAssetWriter . 我正在努力使用AudioConverter编码从AVCaptureSession接收的音频缓冲区，然后将它们附加到AVAssetWriter 。

I'm not getting any errors (including OSStatus responses), and the CMSampleBuffer s generated seem to have valid data, however the resulting file simply does not have any playable audio. 我没有收到任何错误（包括OSStatus响应），并且生成的CMSampleBuffer似乎有有效数据，但生成的文件根本没有任何可播放的音频。 When writing together with video, the video frames stop getting appended a couple of frames in ( appendSampleBuffer() returns false, but with no AVAssetWriter.error ), probably because the asset writer is waiting for the audio to catch up. 当与视频一起写入时，视频帧停止附加几帧（ appendSampleBuffer()返回false，但没有AVAssetWriter.error ），可能是因为资产编写者正在等待音频赶上。 I suspect it's related to the way I'm setting up the priming for AAC. 我怀疑它与我为AAC设置启动的方式有关。

The app uses RxSwift, but I've removed the RxSwift parts so that it's easier to understand for a wider audience. 该应用程序使用RxSwift，但我删除了RxSwift部件，以便更容易理解更广泛的受众。

Please check out comments in the code below for more... comments 请查看以下代码中的评论以获取更多...评论

Given a settings struct: 给定一个设置结构：

import Foundation
import AVFoundation
import CleanroomLogger

public struct AVSettings {

let orientation: AVCaptureVideoOrientation = .Portrait
let sessionPreset                          = AVCaptureSessionPreset1280x720
let videoBitrate: Int                      = 2_000_000
let videoExpectedFrameRate: Int            = 30
let videoMaxKeyFrameInterval: Int          = 60

let audioBitrate: Int                      = 32 * 1024

/// Settings that are `0` means variable rate.
/// The `mSampleRate` and `mChennelsPerFrame` is overwritten at run-time
/// to values based on the input stream.
let audioOutputABSD = AudioStreamBasicDescription(
                            mSampleRate: AVAudioSession.sharedInstance().sampleRate,
                            mFormatID: kAudioFormatMPEG4AAC,
                            mFormatFlags: UInt32(MPEG4ObjectID.AAC_Main.rawValue),
                            mBytesPerPacket: 0,
                            mFramesPerPacket: 1024,
                            mBytesPerFrame: 0,
                            mChannelsPerFrame: 1,
                            mBitsPerChannel: 0,
                            mReserved: 0)

let audioEncoderClassDescriptions = [
    AudioClassDescription(
        mType: kAudioEncoderComponentType,
        mSubType: kAudioFormatMPEG4AAC,
        mManufacturer: kAppleSoftwareAudioCodecManufacturer) ]

}

Some helper functions: 一些辅助函数：

public func getVideoDimensions(fromSettings settings: AVSettings) -> (Int, Int) {
  switch (settings.sessionPreset, settings.orientation)  {
  case (AVCaptureSessionPreset1920x1080, .Portrait): return (1080, 1920)
  case (AVCaptureSessionPreset1280x720, .Portrait): return (720, 1280)
  default: fatalError("Unsupported session preset and orientation")
  }
}

public func createAudioFormatDescription(fromSettings settings: AVSettings) -> CMAudioFormatDescription {
  var result = noErr
  var absd = settings.audioOutputABSD
  var description: CMAudioFormatDescription?
  withUnsafePointer(&absd) { absdPtr in
      result = CMAudioFormatDescriptionCreate(nil,
                                              absdPtr,
                                              0, nil,
                                              0, nil,
                                              nil,
                                              &description)
  }

  if result != noErr {
      Log.error?.message("Could not create audio format description")
  }

  return description!
}

public func createVideoFormatDescription(fromSettings settings: AVSettings) -> CMVideoFormatDescription {
  var result = noErr
  var description: CMVideoFormatDescription?
  let (width, height) = getVideoDimensions(fromSettings: settings)
  result = CMVideoFormatDescriptionCreate(nil,
                                          kCMVideoCodecType_H264,
                                          Int32(width),
                                          Int32(height),
                                          [:],
                                          &description)

  if result != noErr {
      Log.error?.message("Could not create video format description")
  }

  return description!
}

This is how the asset writer is initialized: 这是资产编写者的初始化方式：

guard let audioDevice = defaultAudioDevice() else
{ throw RecordError.MissingDeviceFeature("Microphone") }

guard let videoDevice = defaultVideoDevice(.Back) else
{ throw RecordError.MissingDeviceFeature("Camera") }

let videoInput      = try AVCaptureDeviceInput(device: videoDevice)
let audioInput      = try AVCaptureDeviceInput(device: audioDevice)
let videoFormatHint = createVideoFormatDescription(fromSettings: settings)
let audioFormatHint = createAudioFormatDescription(fromSettings: settings)

let writerVideoInput = AVAssetWriterInput(mediaType: AVMediaTypeVideo,
                                        outputSettings: nil,
                                        sourceFormatHint: videoFormatHint)

let writerAudioInput = AVAssetWriterInput(mediaType: AVMediaTypeAudio,
                                        outputSettings: nil,
                                        sourceFormatHint: audioFormatHint)

writerVideoInput.expectsMediaDataInRealTime = true
writerAudioInput.expectsMediaDataInRealTime = true

let url = NSURL(fileURLWithPath: NSTemporaryDirectory(), isDirectory: true)
        .URLByAppendingPathComponent(NSProcessInfo.processInfo().globallyUniqueString)
        .URLByAppendingPathExtension("mp4")

let assetWriter =  try AVAssetWriter(URL: url, fileType: AVFileTypeMPEG4)

if !assetWriter.canAddInput(writerVideoInput) {
throw RecordError.Unknown("Could not add video input") }

if !assetWriter.canAddInput(writerAudioInput) {
throw RecordError.Unknown("Could not add audio input") }

assetWriter.addInput(writerVideoInput)
assetWriter.addInput(writerAudioInput)

And this is how audio samples are being encoded, problem area is most likely to be around here . 这就是音频样本的编码方式， 问题区域最有可能就在这里 。 I've re-written this so that it doesn't use any Rx-isms. 我重写了这个，所以它不使用任何Rx-isms。

var outputABSD = settings.audioOutputABSD
var outputFormatDescription: CMAudioFormatDescription! = nil
CMAudioFormatDescriptionCreate(nil, &outputABSD, 0, nil, 0, nil, nil, &formatDescription)

var converter: AudioConverter?

// Indicates whether priming information has been attached to the first buffer
var primed = false

func encodeAudioBuffer(settings: AVSettings, buffer: CMSampleBuffer) throws -> CMSampleBuffer? {

  // Create the audio converter if it's not available
  if converter == nil {
      var classDescriptions = settings.audioEncoderClassDescriptions
      var inputABSD = CMAudioFormatDescriptionGetStreamBasicDescription(CMSampleBufferGetFormatDescription(buffer)!).memory
      var outputABSD = settings.audioOutputABSD
      outputABSD.mSampleRate = inputABSD.mSampleRate
      outputABSD.mChannelsPerFrame = inputABSD.mChannelsPerFrame

      var converter: AudioConverterRef = nil
      var result = noErr
      result = withUnsafePointer(&outputABSD) { outputABSDPtr in
          return withUnsafePointer(&inputABSD) { inputABSDPtr in
          return AudioConverterNewSpecific(inputABSDPtr,
                                          outputABSDPtr,
                                          UInt32(classDescriptions.count),
                                          &classDescriptions,
                                          &converter)
          }
      }

      if result != noErr { throw RecordError.Unknown }

      // At this point I made an attempt to retrieve priming info from
      // the audio converter assuming that it will give me back default values
      // I can use, but ended up with `nil`
      var primeInfo: AudioConverterPrimeInfo? = nil
      var primeInfoSize = UInt32(sizeof(AudioConverterPrimeInfo))

      // The following returns a `noErr` but `primeInfo` is still `nil``
      AudioConverterGetProperty(converter, 
                              kAudioConverterPrimeInfo,
                              &primeInfoSize, 
                              &primeInfo)

      // I've also tried to set `kAudioConverterPrimeInfo` so that it knows
      // the leading frames that are being primed, but the set didn't seem to work
      // (`noErr` but getting the property afterwards still returned `nil`)
  }

  let converter = converter!

  // Need to give a big enough output buffer.
  // The assumption is that it will always be <= to the input size
  let numSamples = CMSampleBufferGetNumSamples(buffer)
  // This becomes 1024 * 2 = 2048
  let outputBufferSize = numSamples * Int(inputABSD.mBytesPerPacket)
  let outputBufferPtr = UnsafeMutablePointer<Void>.alloc(outputBufferSize)

  defer {
      outputBufferPtr.destroy()
      outputBufferPtr.dealloc(1)
  }

  var result = noErr

  var outputPacketCount = UInt32(1)
  var outputData = AudioBufferList(
  mNumberBuffers: 1,
  mBuffers: AudioBuffer(
                  mNumberChannels: outputABSD.mChannelsPerFrame,
                  mDataByteSize: UInt32(outputBufferSize),
                  mData: outputBufferPtr))

  // See below for `EncodeAudioUserData`
  var userData = EncodeAudioUserData(inputSampleBuffer: buffer,
                                      inputBytesPerPacket: inputABSD.mBytesPerPacket)

  withUnsafeMutablePointer(&userData) { userDataPtr in
      // See below for `fetchAudioProc`
      result = AudioConverterFillComplexBuffer(
                      converter,
                      fetchAudioProc,
                      userDataPtr,
                      &outputPacketCount,
                      &outputData,
                      nil)
  }

  if result != noErr {
      Log.error?.message("Error while trying to encode audio buffer, code: \(result)")
      return nil
  }

  // See below for `CMSampleBufferCreateCopy`
  guard let newBuffer = CMSampleBufferCreateCopy(buffer,
                                                  fromAudioBufferList: &outputData,
                                                  newFromatDescription: outputFormatDescription) else {
      Log.error?.message("Could not create sample buffer from audio buffer list")
      return nil
  }

  if !primed {
      primed = true
      // Simply picked 2112 samples based on convention, is there a better way to determine this?
      let samplesToPrime: Int64 = 2112
      let samplesPerSecond = Int32(settings.audioOutputABSD.mSampleRate)
      let primingDuration = CMTimeMake(samplesToPrime, samplesPerSecond)

      // Without setting the attachment the asset writer will complain about the
      // first buffer missing the `TrimDurationAtStart` attachment, is there are way
      // to infer the value from the given `AudioBufferList`?
      CMSetAttachment(newBuffer,
                      kCMSampleBufferAttachmentKey_TrimDurationAtStart,
                      CMTimeCopyAsDictionary(primingDuration, nil),
                      kCMAttachmentMode_ShouldNotPropagate)
  }

  return newBuffer

}

Below is the proc that fetches samples for the audio converter, and the data structure that gets passed to it: 下面是为音频转换器提取样本的proc，以及传递给它的数据结构：

private class EncodeAudioUserData {
  var inputSampleBuffer: CMSampleBuffer?
  var inputBytesPerPacket: UInt32

  init(inputSampleBuffer: CMSampleBuffer,
      inputBytesPerPacket: UInt32) {
      self.inputSampleBuffer   = inputSampleBuffer
      self.inputBytesPerPacket = inputBytesPerPacket
  }
}

private let fetchAudioProc: AudioConverterComplexInputDataProc = {
  (inAudioConverter,
  ioDataPacketCount,
  ioData,
  outDataPacketDescriptionPtrPtr,
  inUserData) in

  var result = noErr

  if ioDataPacketCount.memory == 0 { return noErr }

  let userData = UnsafeMutablePointer<EncodeAudioUserData>(inUserData).memory

  // If its already been processed
  guard let buffer = userData.inputSampleBuffer else {
      ioDataPacketCount.memory = 0
      return -1
  }

  var inputBlockBuffer: CMBlockBuffer?
  var inputBufferList = AudioBufferList()
  result = CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer(
              buffer,
              nil,
              &inputBufferList,
              sizeof(AudioBufferList),
              nil,
              nil,
              0,
              &inputBlockBuffer)

  if result != noErr {
      Log.error?.message("Error while trying to retrieve buffer list, code: \(result)")
      ioDataPacketCount.memory = 0
      return result
  }

  let packetsCount = inputBufferList.mBuffers.mDataByteSize / userData.inputBytesPerPacket
  ioDataPacketCount.memory = packetsCount

  ioData.memory.mBuffers.mNumberChannels = inputBufferList.mBuffers.mNumberChannels
  ioData.memory.mBuffers.mDataByteSize = inputBufferList.mBuffers.mDataByteSize
  ioData.memory.mBuffers.mData = inputBufferList.mBuffers.mData

  if outDataPacketDescriptionPtrPtr != nil {
      outDataPacketDescriptionPtrPtr.memory = nil
  }

  return noErr
}

This is how I am converting AudioBufferList s to CMSampleBuffer s: 这就是我将AudioBufferList转换为CMSampleBuffer的方式：

public func CMSampleBufferCreateCopy(
    buffer: CMSampleBuffer,
    inout fromAudioBufferList bufferList: AudioBufferList,
    newFromatDescription formatDescription: CMFormatDescription? = nil)
    -> CMSampleBuffer? {

  var result = noErr

  var sizeArray: [Int] = [Int(bufferList.mBuffers.mDataByteSize)]
  // Copy timing info from the previous buffer
  var timingInfo = CMSampleTimingInfo()
  result = CMSampleBufferGetSampleTimingInfo(buffer, 0, &timingInfo)

  if result != noErr { return nil }

  var newBuffer: CMSampleBuffer?
  result = CMSampleBufferCreateReady(
      kCFAllocatorDefault,
      nil,
      formatDescription ?? CMSampleBufferGetFormatDescription(buffer),
      Int(bufferList.mNumberBuffers),
      1, &timingInfo,
      1, &sizeArray,
      &newBuffer)

  if result != noErr { return nil }
  guard let b = newBuffer else { return nil }

  CMSampleBufferSetDataBufferFromAudioBufferList(b, nil, nil, 0, &bufferList)
  return newBuffer

}

Is there anything that I am obviously doing wrong? 有什么我显然做错了吗？ Is there a proper way to construct CMSampleBuffer s from AudioBufferList ? 有没有从AudioBufferList构造CMSampleBuffer的正确方法？ How do you transfer priming information from the converter to CMSampleBuffer s that you create? 如何将启动信息从转换器传输到您创建的CMSampleBuffer ？

For my use case I need to do the encoding manually as the buffers will be manipulated further down the pipeline (although I've disabled all transformations after the encode in order to make sure that it works.) 对于我的用例，我需要手动进行编码，因为缓冲区将在管道中进一步操作（尽管我在编码后禁用了所有转换以确保它有效。）

Any help would be much appreciated. 任何帮助将非常感激。 Sorry that there's so much code to digest, but I wanted to provide as much context as possible. 很抱歉，有很多代码需要消化，但我想提供尽可能多的上下文。

Thanks in advance :) 提前致谢：）

Some related questions: 一些相关问题：

Some references I've used: 我用过的一些参考文献：

Answer 1

Turns out there were a variety of things that I was doing wrong. 事实证明，我做错了各种各样的事情。 Instead of posting a garble of code, I'm going to try and organize this into bite-sized pieces of things that I discovered.. 我不会发布大量的代码，而是尝试将其组织成我发现的一小部分东西。

Samples vs Packets vs Frames 样本与数据包与帧

This had been a huge source of confusion for me: 这对我来说是一个巨大的混乱来源：

Each CMSampleBuffer can have 1 or more sample buffers (discovered via CMSampleBufferGetNumSamples ) 每个CMSampleBuffer都可以有一个或多个样本缓冲区（通过CMSampleBufferGetNumSamples发现）
Each CMSampleBuffer that contains 1 sample represents a single audio packet . 每个包含1个样本的CMSampleBuffer表示单个音频数据包 。
Therefore, CMSampleBufferGetNumSamples(sample) will return the number of packets contained in the given buffer. 因此， CMSampleBufferGetNumSamples(sample)将返回给定缓冲区中包含的数据包数。
Packets contain frames . 数据包包含帧。 This is governed by the mFramesPerPacket property of the buffer's AudioStreamBasicDescription . 这由缓冲区的AudioStreamBasicDescription的mFramesPerPacket属性控制。 For linear PCM buffers, the total size of each sample buffer is frames * bytes per frame . 对于线性PCM缓冲区，每个样本缓冲区的总大小为每frames * bytes per frame 。 For compressed buffers (like AAC), there is no relationship between the total size and frame count. 对于压缩缓冲区（如AAC），总大小和帧数之间没有关系。

`AudioConverterComplexInputDataProc`

This callback is used to retrieve more linear PCM audio data for encoding. 此回调用于检索更多线性PCM音频数据以进行编码。 It's imperative that you must supply at least the number of packets specified by ioNumberDataPackets . 您必须至少提供ioNumberDataPackets指定的数据包数量。 Since I've been using the converter for real-time push-style encoding, I needed to ensure that each data push contains the minimum amount of packets. 由于我一直在使用转换器进行实时推送式编码，因此我需要确保每次数据推送都包含最少量的数据包。 Something like this (pseudo-code): 像这样的东西（伪代码）：

let minimumPackets = outputFramesPerPacket / inputFramesPerPacket
var buffers: [CMSampleBuffer] = []
while getTotalSize(buffers) < minimumPackets {
  buffers = buffers + [getNextBuffer()]
}
AudioConverterFillComplexBuffer(...)

Slicing `CMSampleBuffer` 's 切片`CMSampleBuffer`的

You can actually slice CMSampleBuffer 's if they contain multiple buffers. 如果它们包含多个缓冲区，您实际上可以对CMSampleBuffer切片。 The tool to do this is CMSampleBufferCopySampleBufferForRange . 执行此操作的工具是CMSampleBufferCopySampleBufferForRange 。 This is nice so that you can provide the AudioConverterComplexInputDataProc with the exact number of packets that it asks for, which makes handling timing information for the resulting encoded buffer easier. 这很好，因此您可以为AudioConverterComplexInputDataProc提供它要求的确切数据包数，这使得处理结果编码缓冲区的计时信息更加容易。 Because if you give the converter 1500 frames of data when it expects 1024 , the result sample buffer will have a duration of 1024/sampleRate as opposed to 1500/sampleRate . 因为如果在期望1024时为转换器提供1500帧数据，结果样本缓冲区的持续时间为1024/sampleRate而不是1500/sampleRate 。

Priming and trim duration 灌注和修剪持续时间

When doing AAC encoding, you must set the trim duration like so: 进行AAC编码时，必须设置修剪持续时间，如下所示：

CMSetAttachment(buffer,
                kCMSampleBufferAttachmentKey_TrimDurationAtStart,
                CMTimeCopyAsDictionary(primingDuration, kCFAllocatorDefault),
                kCMAttachmentMode_ShouldNotPropagate)

One thing I did wrong was that I added the trim duration at encode time . 我做错的一件事是我在编码时添加了修剪持续时间。 This should be handled by your writer so that it can guarantee the information gets added to your leading audio frames. 这应该由您的编写者处理，以便它可以保证信息被添加到您的主要音频帧。

Also, the value of kCMSampleBufferAttachmentKey_TrimDurationAtStart should never be greater than the duration of the sample buffer. 此外， kCMSampleBufferAttachmentKey_TrimDurationAtStart的值绝不应大于样本缓冲区的持续时间。 An example of priming: 启动的一个例子：

Priming frames: 2112 启动帧： 2112
Sample rate: 44100 采样率： 44100
Priming duration: 2112 / 44100 = ~0.0479s 灌注持续时间： 2112 / 44100 = ~0.0479s
First frame, frames: 1024 , priming duration: 1024 / 44100 第一帧，帧数： 1024 ，启动持续时间： 1024 / 44100
Second frame, frames: 1024 , priming duration: 1088 / 41100 第二帧，帧数： 1024 ，启动持续时间： 1088 / 41100

Creating the new `CMSampleBuffer` 创建新的`CMSampleBuffer`

AudioConverterFillComplexBuffer has an optional outputPacketDescriptionsPtr . AudioConverterFillComplexBuffer有一个可选的outputPacketDescriptionsPtr 。 You should use it . 你应该使用它 。 It will point to a new array of packet descriptions that contains sample size information. 它将指向包含样本大小信息的新数据包描述数组。 You need this sample size information to construct the new compressed sample buffer: 您需要此样本大小信息来构造新的压缩样本缓冲区：

let bufferList: AudioBufferList
let packetDescriptions: [AudioStreamPacketDescription]
var newBuffer: CMSampleBuffer?

CMAudioSampleBufferCreateWithPacketDescriptions(
  kCFAllocatorDefault, // allocator
  nil, // dataBuffer
  false, // dataReady
  nil, // makeDataReadyCallback
  nil, // makeDataReadyRefCon
  formatDescription, // formatDescription
  Int(bufferList.mNumberBuffers), // numSamples
  CMSampleBufferGetPresentationTimeStamp(buffer), // sbufPTS (first PTS)
  &packetDescriptions, // packetDescriptions
  &newBuffer)

使用AudioConverter进行AAC编码并写入AVAssetWriter

问题描述

1 个解决方案

解决方案1
8 已采纳 2016-06-25 23:26:30

Samples vs Packets vs Frames 样本与数据包与帧

`AudioConverterComplexInputDataProc`

Slicing `CMSampleBuffer` 's 切片`CMSampleBuffer`的

Priming and trim duration 灌注和修剪持续时间

Creating the new `CMSampleBuffer` 创建新的`CMSampleBuffer`

使用AudioConverter进行AAC编码并写入AVAssetWriter

问题描述

1 个解决方案

解决方案1 8 已采纳 2016-06-25 23:26:30

Samples vs Packets vs Frames 样本与数据包与帧

AudioConverterComplexInputDataProc

Slicing CMSampleBuffer 's 切片CMSampleBuffer的

Priming and trim duration 灌注和修剪持续时间

Creating the new CMSampleBuffer 创建新的CMSampleBuffer

解决方案1
8 已采纳 2016-06-25 23:26:30

`AudioConverterComplexInputDataProc`

Slicing `CMSampleBuffer` 's 切片`CMSampleBuffer`的

Creating the new `CMSampleBuffer` 创建新的`CMSampleBuffer`