[英]FFMpeg library: how to precisely seek in an audio file
Using the FFMpeg library in my Android app, I try to understand how I can seek in an audio file, at a very precise position.在我的 Android 应用程序中使用 FFMpeg 库,我尝试了解如何在一个非常精确的位置查找音频文件。
For example, I want to set the current position in my file to the frame #1234567 (in a file encoded at 44100 Hz), which is equivalent to seek at 27994.717 milliseconds.例如,我想将文件中的当前位置设置为帧 #1234567(在以 44100 Hz 编码的文件中),这相当于在 27994.717 毫秒处查找。
To achieve that, here is what I tried:为了实现这一点,这是我尝试过的:
// this:
av_seek_frame(formatContext, -1, 27994717, 0);
// or this:
av_seek_frame(formatContext, -1, 27994717, AVSEEK_FLAG_ANY);
// or even this:
avformat_seek_file(formatContext, -1, 27994617, 27994717, 27994817, 0);
Using a position in microseconds gives me the best result so far.到目前为止,使用以微秒为单位的位置给了我最好的结果。
But for some reason, the positioning is not totally accurate: when I extract the samples from the audio file, it doesn't start exactly at the expected position.但由于某种原因,定位并不完全准确:当我从音频文件中提取样本时,它并没有完全从预期位置开始。 There is a slight delay of about 30-40 milliseconds (even if I seek to the position 0, surprisingly...).
有大约 30-40 毫秒的轻微延迟(即使我寻找位置 0,令人惊讶的是......)。
Do I use the function the right way, or even the right function?我是否以正确的方式使用该功能,甚至是正确的功能?
EDIT编辑
Here is how I can get the position:这是我获得职位的方法:
AVPacket packet;
AVStream *stream = NULL;
AVFormatContext *formatContext = NULL;
AVCodec *dec = NULL;
// initialization:
avformat_open_input(&formatContext, filename, NULL, NULL);
avformat_find_stream_info(formatContext, NULL);
int audio_stream_index = av_find_best_stream(formatContext, AVMEDIA_TYPE_AUDIO, -1, -1, &dec, 0);
stream = formatContext->streams[audio_stream_index];
...
// later, when I extract samples, here is how I get my position, in microseconds:
av_read_frame(formatContext, &packet);
long position = (long) (1000000 * (packet.pts * ((float) stream->time_base.num / stream->time_base.den)));
Thanks to that piece of code, I can get the position of the beginning of the current frame (frame = bloc of samples, the size depends on the audio format - 1152 samples for mp3, 128 to 1152 for ogg, ...)多亏了这段代码,我可以获得当前帧开始的位置(帧 = 样本块,大小取决于音频格式 - mp3 为 1152 个样本,ogg 为 128 到 1152,...)
The problem is: the value I get in position
is not accurate: it's actually 30 ms late, approximately.问题是:我获得的价值
position
是不准确的:它实际上是30毫秒晚,大约。 For example, when it says 1000000, the actual position is approximately 1030000...例如,当它说 1000000 时,实际位置大约是 1030000...
What did I do wrong?我做错了什么? Is it a bug in FFMpeg?
这是 FFMpeg 中的错误吗?
Thanks for your help.谢谢你的帮助。
It depends on the codec.这取决于编解码器。 For example aac has a resolution of 1024 samples per frame, no matter what the sample rate, it also has priming samples that may be discarded.
例如aac的分辨率为每帧1024个样本,无论采样率如何,它也有可能被丢弃的启动样本。 MP3 has 576 or 1152 samples per frame depending on the layer.
MP3 每帧有 576 或 1152 个样本,具体取决于层。
If you need perfection, use an uncompressed format like wav or riff.如果您需要完美,请使用未压缩的格式,例如 wav 或 riff。
Late, but hopefully, it helps someone.迟到了,但希望对某人有所帮助。 The idea is to save timestamp when seeking and then compare AVPacket->pts with this value (You can do that with AVStream->dts , but it wasn't giving good results in my experiments).
这个想法是在寻找时保存时间戳,然后将AVPacket->pts与这个值进行比较(你可以用AVStream->dts来做到这一点,但在我的实验中它没有给出好的结果)。 If pts is still lower than our target timestamp, then skip frames using AV_PKT_DATA_SKIP_SAMPLES ability of AVPacket->side_data .
如果pts仍然低于我们的目标时间戳,则使用AVPacket->side_data 的AV_PKT_DATA_SKIP_SAMPLES能力跳过帧。
Code for seeking method:寻找方法的代码:
void audio_decoder::seek(float seconds) {
auto stream = m_format_ctx->streams[m_packet->stream_index];
// convert seconds provided by the user to a timestamp in a correct base,
// then save it for later.
m_target_ts = av_rescale_q(seconds * AV_TIME_BASE, AV_TIME_BASE_Q, stream->time_base);
avcodec_flush_buffers(m_codec_ctx.get());
// Here we seek within given stream index and the correct timestamp
// for that stream. Using AVSEEK_FLAG_BACKWARD to make sure we're
// always *before* requested timestamp.
if(int err = av_seek_frame(m_format_ctx.get(), m_packet->stream_index, m_target_ts, AVSEEK_FLAG_BACKWARD)) {
error("audio_decoder: Error while seeking ({})", av_err_str(err));
}
}
And code for decoding method:以及解码方法的代码:
void audio_decoder::decode() {
<...>
while(is_decoding) {
// Read data as usual.
av_read_frame(m_format_ctx.get(), m_packet.get());
// Here is the juicy part. We were seeking, but the seek
// wasn't precise enough so we need to drop some frames.
if(m_packet->pts > 0 && m_target_ts > 0 && m_packet->pts < m_target_ts) {
auto stream = m_format_ctx->streams[m_packet->stream_index];
// Conversion from delta timestamp to frames.
auto time_delta = static_cast<float>(m_target_ts - m_packet->pts) / stream->time_base.den;
int64_t skip_frames = time_delta * m_codec_ctx->time_base.den / m_codec_ctx->time_base.num;
// Next step: we need to provide side data to our packet,
// and it will tell the codec to drop frames.
uint8_t *data = av_packet_get_side_data(m_packet.get(), AV_PKT_DATA_SKIP_SAMPLES, nullptr);
if(!data) {
data = av_packet_new_side_data(m_packet.get(), AV_PKT_DATA_SKIP_SAMPLES, 10);
}
// Define parameters of side data. You can check them here:
// https://ffmpeg.org/doxygen/trunk/group__lavc__packet.html#ga9a80bfcacc586b483a973272800edb97
*reinterpret_cast<uint32_t*>(data) = skip_frames;
data[8] = 0;
}
// Send packet as usual.
avcodec_send_packet(m_codec_ctx.get(), m_packet.get());
// Proceed to the receiving frames as usual, nothing to change there.
}
<...>
}
If it's unclear without context, you can check the same code in my project audio_decoder.cpp .如果没有上下文不清楚,您可以在我的项目audio_decoder.cpp 中检查相同的代码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.