[英]finding the regex to get a url between two phrases
我有以下腳本嘗試獲取此網址: https : //clips-media-assets.twitch.tv/178569498.mp4 ,它介於{“ quality”:“ 1080”,“ source”:“和”但是我的正則表達式似乎不起作用
dt = """
<body>
<script>jQuery(window).load(function () {
setTimeout(function(){s
}, 1000);quality_options: [{"quality":"1080","source":"https://clips-media-assets.twitch.tv/178569498.mp4","frame_rate":60},{"quality":"720","source":"https://clips-media-assets.twitch.tv/AT-178569498-1280x720.mp4","frame_rate":60},{"quality":"480","source":"https://clips-media-assets.twitch.tv/AT-178569498-854x480.mp4","frame_rate":30},{"quality":"360","source":"https://clips-media-assets.twitch.tv/AT-178569498-640x360.mp4","frame_rate":30}]
});</script>
</body>
[download] 28.2x of 57.90MiB at 1.54MiB/s ETA 00:26
"""
pattern = re.compile(r'(?:\G(?!\A)|quality\":\"1080\",\"source\":\")(?:(?!\").)*', re.MULTILINE | re.DOTALL)
clipHTML = BeautifulSoup(dt, "html.parser")
scripts = clipHTML.findAll(['script'])
for script in scripts:
if script:
match = pattern.search(script.text)
if match:
email = match.group(0)
print(email)
如果你堅持使用正則表達式來解決這個問題,嘗試這一個(如圖所示這里 ):
(?<=quality\":\"1080\",\"source\":\")[^\"]+(?=\")
我不了解這種情況,但我不得不提一提,通常來說,用正則表達式解析JSON並不理想。 當然,您可以使用( *)
向正則表達式添加動態編號的空格,但我仍然認為使用JSON解析器更好。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.