简体   繁体   中英

Parsing HLS m3u8 Regex

I am trying to parse HLS m3u8 file and where am stuck at is matching m3u8 links. So, if URI= exists, from #EXT-XI-FRAME-STREAM-INF , grab the one in quotation marks, and if it doesn't, #EXT-X-STREAM-INF , grab the link from new line.

Text:

#EXT-X-STREAM-INF:BANDWIDTH=263851,CODECS="mp4a.40.2, avc1.4d400d",RESOLUTION=416x234,AUDIO="bipbop_audio",SUBTITLES="subs"
gear1/prog_index.m3u8 <== new line link
#EXT-X-I-FRAME-STREAM-INF:URI="gear1/iframe_index.m3u8",CODECS="avc1.4d400d",BANDWIDTH=28451

在此处输入图片说明

Regex:

(?:#EXT-X-STREAM-INF:|#EXT-X-I-FRAME-STREAM-INF:)(?:BANDWIDTH=(?<BANDWIDTH>\d+),?|CODECS=(?<CODECS>"[^"]*"),?|RESOLUTION=(?<RESOLUTION>\d+x\d+),?|AUDIO=(?<AUDIO>"[^"]*"),?|SUBTITLES=(?<SUBTITLES>"[^"]*"),?|URI=(?<URI>"[^"]*"),?)*

Regex demo

A quick fix for your pattern will look like this:

  • Capture the #EXT-X-STREAM-INF part into Group 1
  • Add (?J) modifier to allow named capturing groups with identical names
  • Add a conditional construct that will capture the whole line after the current pattern if Group 1 matched.

The pattern will look like

(?J)(?:(#EXT-X-STREAM-INF)|#EXT-X-I-FRAME-STREAM-INF):(?:BANDWIDTH=(?<BANDWIDTH>\d+),?|CODECS=(?<CODECS>"[^"]*"),?|RESOLUTION=(?<RESOLUTION>\d+x\d+),?|AUDIO=(?<AUDIO>"[^"]*"),?|SUBTITLES=(?<SUBTITLES>"[^"]*"),?|URI=(?<URI>"[^"]*"),?)*(?<URI>(?:(?!#EXT)\S)+))

See the regex demo

So, basically, I added (?<URI>(?:(?!#EXT)\\S)+)) at the end and captured (#EXT-X-STREAM-INF) at the start.

The conditional construct matches like this:

  • (? - start of the conditional construct
    • (1) - if Group 1 matched
    • \\R - a line break
    • (?<URI> - start of a named capturing group
      • (?:(?!#EXT)\\S)+) - any non-whitespace char ( \\S ), 1 or more occurrences ( + ), that is not a starting char of a #EXT char sequence (the so called "tempered greedy token" )
    • ) - end of the named capturing group
  • ) - end of the conditional construct

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM