When I use ffmpeg to create HLS segments, say 4 second video TS segments, how does the corresponding audio stream gets packaged with those TS segments? Will they also be of 4 second duration? If not, why? What's the logic used in packaging audio stream within TS segments?
The quick answer is that the audio and video segments do not need to be exactly the same size, although there are advantage to having them the same - eg easier for a player to parse, simpler DASH and HLS manifest files.
To understand the reason they are often not aligned it useful to look at audio: audio frequently uses the AAC codec which has 1024 frames size packets. The audio sampling rate will determine how many packets are associated with a typical time period.
For example:
It is possible to pick video and audio frame rates that align with a given segment duration - the Fraunhofer Focus team have some examples and good background:
( https://websites.fraunhofer.de/video-dev/why-and-how-to-align-media-segments-for-abr-streaming/ )
It is ultimately a trade off - arguably more complexity on the server side to produce alignment, vs more complexity on the player side if players have to deal with larger manifests and unaligned segments.
Both should work, however, and it is certainly very common to have set video segments sizes, eg 2 seconds, which don't align exactly with audio segment sizes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.