简体   繁体   English

如何从mp3文件中提取音频数据?

[英]How can I extract the audio data from an mp3 file?

I need to create a metadata independent hash of an mp3 file (ie the same hash can be computed after a retag). 我需要创建一个mp3文件的元数据独立哈希(即在重复之后可以计算相同的哈希值)。 How can I extract the audio data only out into memory, without actually running it through a decompressor? 如何将音频数据仅提取到内存中,而不是通过解压缩器实际运行?

MAD seems like a good starting point - http://www.underbit.com/products/mad/ but does not seem to obviously expose a function for doing this. MAD似乎是一个很好的起点 - http://www.underbit.com/products/mad/但似乎没有明显暴露出这样做的功能。

Any pointers appreciated! 任何指针赞赏!

How can I extract the audio data only out into memory, without actually running it through a decompressor? 如何将音频数据仅提取到内存中,而不是通过解压缩器实际运行?

You can't extract the audio data without decompressing it - it's compressed! 如果不对其进行解压缩,则无法提取音频数据 - 它已被压缩! However, if you just want the raw compressed stream, read on! 但是,如果您只想要原始压缩流,请继续阅读!

The typical mp3 audio file will be divided into sections: 典型的mp3音频文件将分为几个部分:
[likely metatag] [很可能是metatag]
[possible junk] [可能的垃圾]
[possible XING/LAME tags [possible more junk]] [可能的XING / LAME标签[可能更多垃圾]]
[mp3 audio frames] [mp3音频帧]
[possible metatag] [可能元标记]

Likely metatag: Most mp3 audio files will have an id3 tag at their head. 可能是metatag:大多数mp3音频文件都会有一个id3标签。 Be aware that some users may tag their mp3 files with different tagging formats, such as APE , so you will need to account for that too. 请注意,某些用户可能会使用不同的标记格式标记其mp3文件,例如APE ,因此您也需要考虑这一点。

Possible junk: Some mp3 audio files have been tagged, re-tagged and converted so many times the metatag header may not provide you an accurate offset the the first audio frame, as remnants of previous tags can be left behind. 可能的垃圾:一些mp3音频文件已被标记,重新标记和转换,因此metatag标头可能无法为第一个音频帧提供准确的偏移,因为之前标签的残留可能会留下。 foobar2000 has an option to fix this. foob​​ar2000有一个选项来解决这个问题。

Possible XING/LAME tags: These are contained within a mp3 audio frame, though they do not contain actual audio. 可能的XING / LAME标签:这些标签包含在mp3音频帧中,但它们不包含实际音频。 madplay has code to show you how to read and parse these frames. madplay有代码向您展示如何阅读和解析这些帧。 The XING/LAME header may have a frame count, so it's worth parsing these headers. XING / LAME标头可能有帧数,因此值得解析这些标头。 Again, if the file has been through many different taggers and editors, there may be several malformed, no-valid audio frames found here. 同样,如果文件已经通过许多不同的标记器和编辑器,则可能存在几个格式错误,无效的音频帧。

MP3 audio frames : The actual compressed stream, broken into 'frames'. MP3音频帧 :实际的压缩流,分为“帧”。 Each frame will begin with a sync bit pattern, 0xFFE. 每帧将以同步位模式0xFFE开始。

Possible metatag : It's not uncommon to find more metatags at the end of the file. 可能的元标记 :在文件末尾找到更多的元标记并不罕见。 id3v1, APE, Lyrics all can be found here. id3v1,APE,歌词都可以在这里找到。

To find the audio frames offset, you will need to parse any metatag headers, then begin looking for the sync bit pattern. 要查找音频帧偏移量,您需要解析任何元标记头,然后开始寻找同步位模式。 You can't just begin looking for the sync pattern from the start of the file, as not all taggers correctly support unsynchronization , so the metatag itself may contain the 0xFFE pattern. 您不能只从文件的开头开始查找同步模式,因为并非所有标记器都支持不同步 ,因此元标记本身可能包含0xFFE模式。

Once you have the the offset to the first audio frame, you should look at the end of the file and calculate how much non-audio data is there so you know when to stop parsing the audio. 一旦你有第一个音频帧的偏移量,你应该查看文件的末尾并计算有多少非音频数据,以便你知道何时停止解析音频。 Once you have the offset to the start of the audio data, and the offset to the end of the audio data, you can pass the audio data through your hash/checksum function! 一旦你有音频数据开头的偏移量和音频数据末尾的偏移量,你就可以通过你的散列/校验和函数传递音频数据了!

You can use ffmpeg to directly access the audio content by using the copy mode. 您可以使用ffmpeg通过使用复制模式直接访问音频内容。 It will not matter what format, since the API will give you a container with the raw data (in copy mode only again). 什么格式无关紧要,因为API会为您提供一个包含原始数据的容器(仅在复制模式下)。 You can also demux and decode in case you have a video or you want to work on the decoded audio data. 如果您有视频或想要处理解码的音频数据,也可以进行解复用和解码。

Check out ffmpeg's examples for a quick intro on how to do this. 查看ffmpeg的示例,快速了解如何执行此操作。 By using ffmpeg i mean not using the tool but using libffmpeg (libavformat, libavcodec) from within c++/c, eventhough i think you could also do this from the cmdline using the ffmpeg tool by sending your output to stdout and pipe it to md5sum or something equivalent (if you're a unix user, that is). 通过使用ffmpeg,我的意思是不使用该工具,而是使用c ++ / c中的libffmpeg(libavformat,libavcodec),尽管我认为您也可以使用ffmpeg工具从cmdline执行此操作,方法是将输出发送到stdout并将其传输到md5sum或等价的东西(如果你是unix用户,那就是)。

The special case "-acodec copy" tells ffmpeg to use the same codec to encode as was used to decode. 特殊情况“-acodec copy”告诉ffmpeg使用相同的编解码器进行编码,就像用于解码一样。 In other words, no transcoding of the audio occurs. 换句话说,不发生音频的代码转换。

What kind of audio data? 什么样的音频数据? The raw decoded PCM stream? 原始解码的PCM流? The individual MP3 frames? 个人MP3相框? What if it's an MP3 encapsulated in a .wav? 如果它是一个封装在.wav中的MP3怎么办? It could still have a .mp3 extension, but have the full .wav wrapper around it. 它仍然可以有.mp3扩展名,但是它周围有完整的.wav包装器。

Stripping off an ID3v1 tag is simple - it's just 128 bytes at the end of the file. 剥离ID3v1标签很简单 - 文件末尾只有128个字节。 ID3v2 is a bit harder - it's variable length and prepended to the start of the MP3 and you'd have to parse out the length field (which is 4 bytes where only the lowest 7bits are used, giving a 28bit max-length for the tag). ID3v2有点难 - 它的长度可变,并且在MP3的开头之前,你必须解析长度字段(这是4字节,其中只使用最低的7位,为标签提供28位的最大长度)。 The .wav wrapper would be harder still - I don't know any details about what .wav imposes as metadata. .wav包装器会更难 - 我不知道.wav强加给元数据的任何细节。

I recently needed to solve this problem as well (detect duplicate mp3 files which had differing ID3 tags). 我最近也需要解决这个问题(检测具有不同ID3标签的重复mp3文件)。 The easiest thing to do was use ffmpeg to make a copy of the mp3 file with all of the ID3 tags stripped, and then take an md5 sum of that. 最简单的方法是使用ffmpeg制作一个mp3文件的副本,并删除所有ID3标签,然后取md5的总和。

See https://github.com/pepaslabs/mp3md5sum 请参阅https://github.com/pepaslabs/mp3md5sum

ffmpeg alone can calculate MD5 hash of audio segment of an audio file, ie sans metadata. 单独的ffmpeg可以计算音频文件的音频片段的MD5哈希,即无元数据。

Use: 采用:

ffmpeg -v -i $file -acodec copy -f md5 -

Note that FLAC already has MD5 hash stored as metadata. 请注意,FLAC已将MD5哈希存储为元数据。

I wrote this bare bones little snippet for a Linux box with an old mp3 player that couldn't handle tags. 我为一个Linux盒子写了这个简单的小片段,里面有一个无法处理标签的旧MP3播放器。 What is left is just the mp3 headers and data (on stdout as coded). 剩下的就是mp3标题和数据(在编码的stdout上)。 You can use that for your md5. 您可以将它用于md5。

#include <fcntl.h>
#define DUMPTAGS
int main(int argc, char **argv){
   unsigned char buf[4096];
   int len,fd = open(argv[1],O_RDONLY);
   while (len=read(fd,buf,10)){ // handle ID3v2 tags (maybe multiple)
      if (buf[0]=='I' && buf[1]=='D' && buf[2]=='3'){
         len=read(fd,buf,buf[9]|(buf[8] << 7)|(buf[7] << 14)|(buf[6] << 21));
#ifdef DUMPTAGS
         write(2,buf,len);
#endif
      } else break;
   }
   while (write(1,buf,len)){
      unsigned char tag[3] = {'T','A','G'}, *end;
      len=read(fd,buf,4096);
      end=(unsigned char *)memmem(buf,len,&tag,3);
      if (end){ //handle ID3v1 tag (should only be 1)
         write(1,buf,end-buf);
#ifdef DUMPTAGS
         write(2,end,len-(end-buf));
#endif
         break;
      }
   }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM