如何在 IOS 中使用 OpusCodec 編碼和解碼實時音頻?

[英]How to encode and decode Real-time Audio using OpusCodec in IOS?


  1. 從 iOS 設備 (iPhone) 錄制實時音頻
  2. 將此音頻數據編碼為 Opus 數據並通過 WebSocket 將其發送到服務器
  3. 再次將接收到的數據解碼為 pcm
  4. 在 iOS 設備 (iPhone) 上播放來自 WebSocket 服務器的音頻


 var engine = AVAudioEngine()
 var input: AVAudioInputNode = engine.inputNode
 var format: AVAudioFormat = input.outputFormat(forBus: AVAudioNodeBus(0))
 input.installTap(onBus: AVAudioNodeBus(0), bufferSize: AVAudioFrameCount(8192), format: format, block: { buf, when in
 // ‘buf' contains audio captured from input node at time 'when'

 // start engine

我使用這個函數將 AVAudioPCMBuffer 轉換為 Data

func toData(PCMBuffer: AVAudioPCMBuffer) -> Data {
    let channelCount = 1
    let channels = UnsafeBufferPointer(start: PCMBuffer.floatChannelData, count: channelCount)
    let ch0Data = NSData(bytes: channels[0], length:Int(PCMBuffer.frameLength * PCMBuffer.format.streamDescription.pointee.mBytesPerFrame))
    return ch0Data as Data

我從 CocoaPod libopus libopus找到了 Opus Library

我搜索了很多有關如何在 IOS 中使用 OpusCodec 的信息,但沒有找到解決方案。

如何使用 OpusCodec 編碼和解碼這些數據? 我需要 jitterBuffer 嗎? 如果我需要如何在IOS中使用它

此代碼適用於 Opus 編解碼器,但語音不清晰

#import "OpusManager.h"
#import <opus/opus.h>

#define SAMPLE_RATE 16000
#define CHANNELS 1
* Audio frame size
* It is divided by time. When calling, you must use the audio data of 
exactly one frame (multiple of 2.5ms: 2.5, 5, 10, 20, 40, 60ms).
* Fs/ms   2.5     5       10      20      40      60
* 8kHz    20      40      80      160     320     480
* 16kHz   40      80      160     320     640     960
* 24KHz   60      120     240     480     960     1440
* 48kHz   120     240     480     960     1920    2880
#define FRAME_SIZE 320

#define MAX_PACKET_BYTES    (FRAME_SIZE * CHANNELS * sizeof(float))
#define MAX_FRAME_SIZE      (FRAME_SIZE * CHANNELS * sizeof(float))

typedef opus_int16 OPUS_DATA_SIZE_T;

@implementation OpusManager {
    OpusEncoder *_encoder;
    OpusDecoder *_decoder;

int size;
int error;
unsigned char encodedPacket[MAX_PACKET_BYTES];

- (instancetype)init {
    self = [super init];
    if (self) {

        size = opus_encoder_get_size(CHANNELS);
        _encoder = malloc(size);
        error = opus_encoder_init(_encoder, SAMPLE_RATE, CHANNELS, APPLICATION);   
        _encoder = opus_encoder_create(SAMPLE_RATE, CHANNELS, APPLICATION, &error);
        _decoder = opus_decoder_create(SAMPLE_RATE, CHANNELS, &error);

        opus_encoder_ctl(_encoder, OPUS_SET_BITRATE(BITRATE));
        opus_encoder_ctl(_encoder, OPUS_SET_COMPLEXITY(10));
        opus_encoder_ctl(_encoder, OPUS_SET_SIGNAL(OPUS_SIGNAL_VOICE));
        opus_encoder_ctl(_encoder, OPUS_SET_VBR(0));
        opus_encoder_ctl(_encoder, OPUS_SET_APPLICATION(APPLICATION));
        opus_encoder_ctl(_encoder, OPUS_SET_DTX(1));
        opus_encoder_ctl(_encoder, OPUS_SET_INBAND_FEC(0));
        opus_encoder_ctl(_encoder, OPUS_SET_BANDWIDTH(12000));
        opus_encoder_ctl(_encoder, OPUS_SET_PACKET_LOSS_PERC(1));
        opus_encoder_ctl(_encoder, OPUS_SET_INBAND_FEC(1));
        opus_encoder_ctl(_encoder, OPUS_SET_FORCE_CHANNELS(CHANNELS));
        opus_encoder_ctl(_encoder, OPUS_SET_PACKET_LOSS_PERC(1));
     return self;

- (NSData *)encode:(NSData *)PCM {

    opus_int16 *PCMPtr = (opus_int16 *)PCM.bytes;
    int PCMSize = (int)PCM.length / sizeof(opus_int16);
    opus_int16 *PCMEnd = PCMPtr + PCMSize;
    NSMutableData *mutData = [NSMutableData data];
    unsigned char encodedPacket[MAX_PACKET_BYTES];

    // Record opus block size
    OPUS_DATA_SIZE_T encodedBytes = 0;

    while (PCMPtr + FRAME_SIZE < PCMEnd) {
    encodedBytes = opus_encode_float(_encoder, (const float *) PCMPtr, FRAME_SIZE, encodedPacket, MAX_PACKET_BYTES);

    if (encodedBytes <= 0) {
        NSLog(@"ERROR: encodedBytes<=0");
        return nil;
    NSLog(@"encodedBytes: %d",  encodedBytes);

    // Save the opus block size
    [mutData appendBytes:&encodedBytes length:sizeof(encodedBytes)];

    // Save opus data
    [mutData appendBytes:encodedPacket length:encodedBytes];


    NSLog(@"mutData: %lu", (unsigned long)mutData.length);
    NSLog(@"encodedPacket: %s", encodedPacket);

    return mutData.length > 0 ? mutData : nil;


- (NSData *)decode:(NSData *)opus {

    unsigned char *opusPtr = (unsigned char *)opus.bytes;
    int opusSize = (int)opus.length;
    unsigned char *opusEnd = opusPtr + opusSize;

    NSMutableData *mutData = [NSMutableData data];

    float decodedPacket[MAX_FRAME_SIZE];
    int decodedSamples = 0;

    // Save data for opus block size
    OPUS_DATA_SIZE_T nBytes = 0;

    while (opusPtr < opusEnd) {
        // Take out the opus block size data
        nBytes = *(OPUS_DATA_SIZE_T *)opusPtr;
        opusPtr += sizeof(nBytes);

        decodedSamples = opus_decode_float(_decoder, opusPtr, nBytes,decodedPacket, MAX_FRAME_SIZE, 0);

        if (decodedSamples <= 0) {
            NSLog(@"ERROR: decodedSamples<=0");
            return nil;
        NSLog(@"decodedSamples:%d", decodedSamples);
        [mutData appendBytes:decodedPacket length:decodedSamples *sizeof(opus_int16)];

        opusPtr += nBytes;
    NSLog(@"mutData: %lu", (unsigned long)mutData.length);
    return mutData.length > 0 ? mutData : nil;


嘗試降低帶寬或設置更高的比特率。 我認為 12kHz 帶寬單聲道音頻的 16kbit 可能太低了。 認為在設置應用程序 VOIP 的情況下將帶寬留給自動會更好。 周圍可能還有其他問題,但“聽起來不太好”不足以分析。




