简体   繁体   English

从波形录制的音频文件中删除初始静音

[英]Remove initial silence from recorded audio file of wave type

Can anyone help me out removing the initial silence in recorded audio file? 谁能帮助我消除录制的音频文件中的初始静音?

I am fetching the data bytes of wav file and after ignoring first 44 header bytes getting the end range of 0 bytes which are silent in wave file. 我正在获取wav文件的数据字节,并忽略了前44个头字节字节,获得0字节的结束范围(在wave文件中处于静默状态)。

After that from total data bytes, end range of silent audio bytes and total duration of file, I am calculating the silence time of audio file and trimming that much time from audio file. 从总数据字节,静音音频字节的结束范围和文件的总持续时间之后,我正在计算音频文件的静音时间,并从音频文件中修剪那么多时间。

But the issue is still there is some silent part remaining in audio file. 但是问题仍然在于音频文件中还剩下一些无声部分。

So not sure if I missed something? 所以不确定我是否错过了什么?

- (double)processAudio:(float)totalFileDuration withFilePathURL:(NSURL *)filePathURL{
    NSMutableData *data = [NSMutableData dataWithContentsOfURL:filePathURL];
    NSMutableData *Wave1= [NSMutableData dataWithData:[data subdataWithRange:NSMakeRange(44, [data length] - 44)]];
    uint8_t * bytePtr = (uint8_t  * )[Wave1 bytes] ;
    NSInteger totalData = [Wave1 length] / sizeof(uint8_t);
    int endRange = 0;
    for (int i = 0 ; i < totalData; i ++){
           /
        if (bytePtr[i] == 0) {
            endRange = i;
        }else
            break;
    }

    double silentAudioDuration =(((float)endRange/(float)totalData)*totalFileDuration);
    return silentAudioDuration;
}
- (void)trimAudioFileWithInputFilePath :(NSString *)inputPath toOutputFilePath : (NSString *)outputPath{
    /
    NSString *strInputFilePath = inputPath;
    NSURL *audioFileInput = [NSURL fileURLWithPath:strInputFilePath];

    /
    NSString *strOutputFilePath = [outputPath stringByDeletingPathExtension];
    strOutputFilePath = [strOutputFilePath stringByAppendingString:@".m4a"];
    NSURL *audioFileOutput = [NSURL fileURLWithPath:strOutputFilePath];
    newPath = strOutputFilePath;

    if (!audioFileInput || !audioFileOutput){
        /
    }

    [[NSFileManager defaultManager] removeItemAtURL:audioFileOutput error:NULL];
    AVAsset *asset = [AVAsset assetWithURL:audioFileInput];
    CMTime audioDuration = asset.duration;
    float audioDurationSeconds = CMTimeGetSeconds(audioDuration);

    AVAssetExportSession *exportSession = [AVAssetExportSession exportSessionWithAsset:asset presetName:AVAssetExportPresetAppleM4A];

    if (exportSession == nil){
        /
    }

    /
    float startTrimTime = [self processAudio:audioDurationSeconds withFilePathURL:audioFileInput];
    /
    /
    float endTrimTime = audioDurationSeconds;

    recordingDuration = audioDurationSeconds - startTrimTime;

    CMTime startTime = CMTimeMake((int)(floor(startTrimTime * 100)), 100);
    CMTime stopTime = CMTimeMake((int)(ceil(endTrimTime * 100)), 100);
    CMTimeRange exportTimeRange = CMTimeRangeFromTimeToTime(startTime, stopTime);

    exportSession.outputURL = audioFileOutput;
    exportSession.outputFileType = AVFileTypeAppleM4A;
    exportSession.timeRange = exportTimeRange;

    [exportSession exportAsynchronouslyWithCompletionHandler:^{
         if (AVAssetExportSessionStatusCompleted == exportSession.status){
         }
         else if (AVAssetExportSessionStatusFailed == exportSession.status){
         }
     }];
}

What am I doing wrong here? 我在这里做错了什么?

It is possible that you don't have complete silence in your files? 您的文件可能没有完全静音吗? Perhaps your sample has a value of 1 or 2 or 3 which technically is not silent but it is very quiet. 也许您的样本的值为1或2或3,从技术上讲它不是静默的,但非常安静。

Wave files are stored as signed numbers if 16 bits and unsigned if 8 bits. Wave文件如果是16位,则存储为带符号的数字;如果是8位,则存储为无符号的。 You are processing and casting your data to be an unsigned byte: uint8_t * bytePtr = (uint8_t * )[Wave1 bytes] ; 您正在处理数据并将其转换为无符号字节:uint8_t * bytePtr =(uint8_t *)[Wave1 bytes];

You need to know the format of your wave file which can be obtained from the header. 您需要知道可以从标题获得的wave文件的格式。 (It might use sample sizes of say 8 bit, 16 bit, 24 bit, etc.) (它可能会使用8位,16位,24位等样本大小)

If it is 16 bits and mono, you need to use: 如果是16位单声道,则需要使用:

int16_t * ptr = (int16_t) [Wave1 bytes];

Your loop counts one byte at a time so you would need to adjust it to increment by the size of your frame size. 您的循环一次计数一个字节,因此您需要对其进行调整以增加帧大小的大小。

You also don't consider mono/stereo. 您也不考虑单声道/立体声。
In general, your processAudio function needs more details and should consider the number of channels per frame (stereo/mono) and the size of the sample size. 通常,您的processAudio函数需要更多详细信息,并应考虑每帧的通道数(立体声/单声道)和样本大小的大小。

Here is a wave header with iOS types. 这是iOS类型的wave标头。 You can cast the first 44 bytes and get the header data so you know what you are dealing with. 您可以转换前44个字节并获取标头数据,从而知道要处理的内容。

typedef struct waveHeader_t
{
    //RIFF
    char        chunkID[4];             ///< Should always contain "RIFF" BigEndian    //4
    uint32_t    chunkSize;              ///< total file length minus 8  (little endian!!!)    //4
    char        format[4];              ///< should be "WAVE"  Big Endian

    // fmt
    char        subChunk1ID[4];         ///< "fmt " Big Endian                //4
    uint32_t    subChunk1Size;          ///< 16 for PCM format                        //2
    uint16_t    audioFormat;            ///< 1 for PCM format                       //2
    uint16_t    numChannels;            ///< channels                                     //2
    uint32_t    sampleRate;             ///< sampling frequency                           //4
    uint32_t    byteRate;               ///< samplerate * numchannels * bitsperSample/8
    uint16_t    blockAlign;             ///< frame size
    uint16_t    bitsPerSample;          ///< bits per Sample

    char        subChunk2ID[4];         ///< should always contain "data"
    uint32_t    subChunk2Size;          ///< 

    ///< sample data follows this.....
} waveHeader_t;

So your todo list is 所以你的待办事项清单是

  • Extract the fields from the header 从标题中提取字段
  • Specifically get number of channels and bits per channel (note **BITS per channel) 具体获得通道数和每个通道的位数(请注意**每个通道的BITS)
  • Point to the data with the appropriate size pointer and loop through one frame at a time. 使用适当的大小指针指向数据,并一次遍历一帧。 (A mono frame has one sample that is could be 8, 16, 24 etc bits. A stereo frame has two samples that could be 8, 16, or 24 bits per sample. eg LR LR LR LR LR LR would be 6 frames) (单帧具有一个可能为8、16、24等位的样本。立体声帧具有两个可能为每个样本8、16或24位的采样。例如LR LR LR LR LR LR将为6帧)

The header of an Apple generated wave file is usually not 44 bytes in length. Apple生成的wave文件的标头通常不是44个字节。 Some Apple generated headers are 4k bytes in length. Apple生成的某些标头的长度为4k字节。 You have to inspect the wave RIFF header for extra 'FFLR' bytes. 您必须检查wave RIFF标头中是否有额外的“ FFLR”字节。 If you don't skip past this extra filler padding, you will end up with about an extra tenth of a second in silence (or potentially even bad data). 如果您不跳过这些多余的填充,您将在安静状态下停留大约十分之一秒的时间(甚至可能是不良数据)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM