简体   繁体   中英

Regex - Match javascript variables in html source code

I have a webpage with javascript inside and I need to match 2 variables passed to a function:

<html>
<!--Some html code-->
document.write(function('variable1', 'variable2'));
<!--Some html code-->
</html>

variable1 and variable2 can be strings of any lenght with mixed characters and digits. I need to match them both. This is what I use now:

data = getSoup(url) # my function to get the beautifulsoup object
script = data.find('script', text = re.compile(r'document\.write\(function\(')).text.replace('document.write(function(\'', '')
variable1 = script.split("', '")[0]
variable2 = script.split("', '")[1].replace("'));","")

But I would like to use something more simple and "safe" (even because not always the function is insite a script tag.

Update: Thanks to Thomas Ayoub answer I found a simple solution working for me:

script = re.findall(r"document\.write\(function\(\'(.*?)\', \'(.*?)\'\)\)\;", str(data))[0]
variable1 = script[0]
variable2 = script[1]

You can use this regex:

regex = r"document\.write\(function\(\s*'([^']+)'\s*,\s*'([^']+)'\s*\)"

See demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM