简体   繁体   中英

finding the regex to get a url between two phrases

I have the following script trying to get this url: https://clips-media-assets.twitch.tv/178569498.mp4 which is in between {"quality":"1080","source":" and a " but my regex doesn't seem to be working

dt = """
<body>
    <script>jQuery(window).load(function () {
      setTimeout(function(){s
      }, 1000);quality_options: [{"quality":"1080","source":"https://clips-media-assets.twitch.tv/178569498.mp4","frame_rate":60},{"quality":"720","source":"https://clips-media-assets.twitch.tv/AT-178569498-1280x720.mp4","frame_rate":60},{"quality":"480","source":"https://clips-media-assets.twitch.tv/AT-178569498-854x480.mp4","frame_rate":30},{"quality":"360","source":"https://clips-media-assets.twitch.tv/AT-178569498-640x360.mp4","frame_rate":30}]

    });</script>
</body>
[download]  28.2x of 57.90MiB at  1.54MiB/s ETA 00:26 


"""



pattern = re.compile(r'(?:\G(?!\A)|quality\":\"1080\",\"source\":\")(?:(?!\").)*', re.MULTILINE | re.DOTALL)
clipHTML = BeautifulSoup(dt, "html.parser")

scripts = clipHTML.findAll(['script'])
for script in scripts:
    if script:
        match = pattern.search(script.text)
        if match:
            email = match.group(0)
            print(email)

If you insist on using a regex to solve this, try this one (as shown here ):

(?<=quality\":\"1080\",\"source\":\")[^\"]+(?=\")

I don't know specifically about this case, but I have to mention that in general it's not ideal to parse JSON with regular expressions. Of course you can add dynamic-numbered spaces to the regex using ( *) , but still I think it's better to use a JSON parser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM