简体   繁体   中英

search a var in script tag by bs4 & python

url = "www.xxxx.com"
rlink = requests.get(url, cookies=cookies).content
html = BeautifulSoup(rlink, 'html.parser')
scripttags = html.findAll("script")

In html DOM, it will have about 7x script tags, I need to search a variable (unique) in every script tag

variable is

var playbackUrl = 'https://www.yyyy.com'
for i in range(len(scripttags)):
    if "playbackUrl" in str(scripttags[i]):
        for j in str(scripttags[i]).split("\n"):
            if "playbackUrl" in j:
                url_=re.search("'(.*)'", j).group(1)
                print(url_)

though my script can do the job, however, just wonder if any smart way to do the task

Code can be more readable if you learn to use for -loop without range(len())

And you don't have to split it into lines

html = '''<script>
var other = 'test';
var playbackUrl = 'https://www.example1.com';
var next = 'test';
</script>

<script>
var other = 'test';
var playbackUrl = 'https://www.example2.com';
var next = 'test';
</script>
'''

from bs4 import BeautifulSoup
import re

soup = BeautifulSoup(html, 'html.parser')
scripttags = soup.find_all("script")

for script in scripttags:
    
    results = re.search("var playbackUrl = '(.*)'", script.text)
    if results:
        print('search:', results[1])
    
    # OR
    
    results = re.findall("var playbackUrl = '(.*)'", script.text)
    if results:
        print('findall:', results[0])    

Result:

search: https://www.example1.com
findall: https://www.example1.com

search: https://www.example2.com
findall: https://www.example2.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM