简体   繁体   中英

Audio stream packaging in HLS stream

When I use ffmpeg to create HLS segments, say 4 second video TS segments, how does the corresponding audio stream gets packaged with those TS segments? Will they also be of 4 second duration? If not, why? What's the logic used in packaging audio stream within TS segments?

The quick answer is that the audio and video segments do not need to be exactly the same size, although there are advantage to having them the same - eg easier for a player to parse, simpler DASH and HLS manifest files.

To understand the reason they are often not aligned it useful to look at audio: audio frequently uses the AAC codec which has 1024 frames size packets. The audio sampling rate will determine how many packets are associated with a typical time period.

For example:

  • 48000 sampling rate
  • 4 second segments will have 4*48000 = 192000 samples.
  • 192000/1024 = 187.5 packets - ie not an even number of packets

It is possible to pick video and audio frame rates that align with a given segment duration - the Fraunhofer Focus team have some examples and good background:

在此处输入图像描述

( https://websites.fraunhofer.de/video-dev/why-and-how-to-align-media-segments-for-abr-streaming/ )

It is ultimately a trade off - arguably more complexity on the server side to produce alignment, vs more complexity on the player side if players have to deal with larger manifests and unaligned segments.

Both should work, however, and it is certainly very common to have set video segments sizes, eg 2 seconds, which don't align exactly with audio segment sizes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM