简体   繁体   中英

Need help in extracting url from this text file to a list in python

I tried a lot but as I am still a beginner I wasn't able to do it.

This is the file.

#EXTM3U
#EXT-X-VERSION:6
#EXT-X-PLAYLIST-TYPE:VOD
#EXT-X-TARGETDURATION:10
#EXT-X-MEDIA-SEQUENCE:0
#EXTINF:1.985272,
https://cf-hls-media.sndcdn.com/media/0/31762/ANO2gOXIOByi.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLWhscy1tZWRpYS5zbmRjZG4uY29tL21lZGlhLyovKi9BTk8yZ09YSU9CeWkuMTI4Lm1wMyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY0MzcyNTA4OH19fV19&Signature=IxhfXJso~wmlcSjKvHwA8a91jNurbzOUJaePFormiG5iCpgSNjbMktJzryKjQQTJvWYYKal8q0omfE1GQxZHAojYRA4AmawwdaWcq90QV0Q4uNqVQi9aY9P6WbhnnepZW2B8FuER~pMy~MAldAUZ9UXKspWmDLTRgo1NCgpAqU-IgkESwtDffTo7kDpAiMQ2nyyI5bjeO0gMUPpa0hIfiCAJidXhyzwMzdvQy8woiyEUfxbFm0UsGFU0U8rlA6Xp7RiVnwnpeq-gxfxguSeeqvl-wduXbZAuwYVodhOiGtSFDLmDLT3x9WckIzCmnGcDsmPK~h~xVAZ8-Vxl3IMsSA__&Key-Pair-Id=APKAI6TU7MMXM5DG6EPQ
#EXTINF:2.977908,
https://cf-hls-media.sndcdn.com/media/31763/79410/ANO2gOXIOByi.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLWhscy1tZWRpYS5zbmRjZG4uY29tL21lZGlhLyovKi9BTk8yZ09YSU9CeWkuMTI4Lm1wMyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY0MzcyNTA4OH19fV19&Signature=IxhfXJso~wmlcSjKvHwA8a91jNurbzOUJaePFormiG5iCpgSNjbMktJzryKjQQTJvWYYKal8q0omfE1GQxZHAojYRA4AmawwdaWcq90QV0Q4uNqVQi9aY9P6WbhnnepZW2B8FuER~pMy~MAldAUZ9UXKspWmDLTRgo1NCgpAqU-IgkESwtDffTo7kDpAiMQ2nyyI5bjeO0gMUPpa0hIfiCAJidXhyzwMzdvQy8woiyEUfxbFm0UsGFU0U8rlA6Xp7RiVnwnpeq-gxfxguSeeqvl-wduXbZAuwYVodhOiGtSFDLmDLT3x9WckIzCmnGcDsmPK~h~xVAZ8-Vxl3IMsSA__&Key-Pair-Id=APKAI6TU7MMXM5DG6EPQ
#EXTINF:4.989302,
https://cf-hls-media.sndcdn.com/media/79411/159240/ANO2gOXIOByi.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLWhscy1tZWRpYS5zbmRjZG4uY29tL21lZGlhLyovKi9BTk8yZ09YSU9CeWkuMTI4Lm1wMyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY0MzcyNTA4OH19fV19&Signature=IxhfXJso~wmlcSjKvHwA8a91jNurbzOUJaePFormiG5iCpgSNjbMktJzryKjQQTJvWYYKal8q0omfE1GQxZHAojYRA4AmawwdaWcq90QV0Q4uNqVQi9aY9P6WbhnnepZW2B8FuER~pMy~MAldAUZ9UXKspWmDLTRgo1NCgpAqU-IgkESwtDffTo7kDpAiMQ2nyyI5bjeO0gMUPpa0hIfiCAJidXhyzwMzdvQy8woiyEUfxbFm0UsGFU0U8rlA6Xp7RiVnwnpeq-gxfxguSeeqvl-wduXbZAuwYVodhOiGtSFDLmDLT3x9WckIzCmnGcDsmPK~h~xVAZ8-Vxl3IMsSA__&Key-Pair-Id=APKAI6TU7MMXM5DG6EPQ
#EXTINF:9.978604,
https://cf-hls-media.sndcdn.com/media/159241/318900/ANO2gOXIOByi.128.mp3?Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiKjovL2NmLWhscy1tZWRpYS5zbmRjZG4uY29tL21lZGlhLyovKi9BTk8yZ09YSU9CeWkuMTI4Lm1wMyIsIkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTY0MzcyNTA4OH19fV19&Signature=IxhfXJso~wmlcSjKvHwA8a91jNurbzOUJaePFormiG5iCpgSNjbMktJzryKjQQTJvWYYKal8q0omfE1GQxZHAojYRA4AmawwdaWcq90QV0Q4uNqVQi9aY9P6WbhnnepZW2B8FuER~pMy~MAldAUZ9UXKspWmDLTRgo1NCgpAqU-IgkESwtDffTo7kDpAiMQ2nyyI5bjeO0gMUPpa0hIfiCAJidXhyzwMzdvQy8woiyEUfxbFm0UsGFU0U8rlA6Xp7RiVnwnpeq-gxfxguSeeqvl-wduXbZAuwYVodhOiGtSFDLmDLT3x9WckIzCmnGcDsmPK~h~xVAZ8-Vxl3IMsSA__&Key-Pair-Id=APKAI6TU7MMXM5DG6EPQ
#EXT-X-ENDLIST

This is the code I wrote.

import re
a=[]
regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
with open("playlist.m3u") as file:
    for line in file:
        urls = re.findall(regex, line)
        if(urls):
            a.append(urls)
print(a)

I am not sure the regex is necessary, you can use startswith() function

a=[]
with open("playlist.m3u") as file:
    for line in file:
        if(line.startswith("http")):
            a.append(line.strip("\n"))
print(a)

Change your code:

import re
a=[]
regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
with open("playlist.m3u") as file:
    for line in file.readlines():
        urls = re.findall(regex, line)
        if(urls):
            a.append(urls)
print(a)

or try this:

import re
a=[]
regex = r"(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))"
with open("playlist.m3u") as file:
    for line in file.read():
        urls = re.findall(regex, line)
        if(urls):
            a.append(urls)
print(a)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM