I am trying to parse HLS m3u8 file and where am stuck at is matching m3u8 links. So, if URI=
exists, from #EXT-XI-FRAME-STREAM-INF
, grab the one in quotation marks, and if it doesn't, #EXT-X-STREAM-INF
, grab the link from new line.
Text:
#EXT-X-STREAM-INF:BANDWIDTH=263851,CODECS="mp4a.40.2, avc1.4d400d",RESOLUTION=416x234,AUDIO="bipbop_audio",SUBTITLES="subs"
gear1/prog_index.m3u8 <== new line link
#EXT-X-I-FRAME-STREAM-INF:URI="gear1/iframe_index.m3u8",CODECS="avc1.4d400d",BANDWIDTH=28451
Regex:
(?:#EXT-X-STREAM-INF:|#EXT-X-I-FRAME-STREAM-INF:)(?:BANDWIDTH=(?<BANDWIDTH>\d+),?|CODECS=(?<CODECS>"[^"]*"),?|RESOLUTION=(?<RESOLUTION>\d+x\d+),?|AUDIO=(?<AUDIO>"[^"]*"),?|SUBTITLES=(?<SUBTITLES>"[^"]*"),?|URI=(?<URI>"[^"]*"),?)*
A quick fix for your pattern will look like this:
#EXT-X-STREAM-INF
part into Group 1 (?J)
modifier to allow named capturing groups with identical names The pattern will look like
(?J)(?:(#EXT-X-STREAM-INF)|#EXT-X-I-FRAME-STREAM-INF):(?:BANDWIDTH=(?<BANDWIDTH>\d+),?|CODECS=(?<CODECS>"[^"]*"),?|RESOLUTION=(?<RESOLUTION>\d+x\d+),?|AUDIO=(?<AUDIO>"[^"]*"),?|SUBTITLES=(?<SUBTITLES>"[^"]*"),?|URI=(?<URI>"[^"]*"),?)*(?<URI>(?:(?!#EXT)\S)+))
See the regex demo
So, basically, I added (?<URI>(?:(?!#EXT)\\S)+))
at the end and captured (#EXT-X-STREAM-INF)
at the start.
The conditional construct matches like this:
(?
- start of the conditional construct
(1)
- if Group 1 matched \\R
- a line break (?<URI>
- start of a named capturing group
(?:(?!#EXT)\\S)+)
- any non-whitespace char ( \\S
), 1 or more occurrences ( +
), that is not a starting char of a #EXT
char sequence (the so called "tempered greedy token" ) )
- end of the named capturing group )
- end of the conditional construct
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.