简体   繁体   中英

how to extract javascript variables by using python bs4

<script type="text/javascript">var csrfMagicToken = "sid:bf8be784734837a64a47fcc30b9df99,162591180";var csrfMagicName = "__csrf_magic";</script>

The above script tag is from a webpage.

script = soup.find_all('script')[5]

By using the above line of code I was able to extract the script tag which I want but I need to extract the value of variables in a python script,I am using BeautifulSoup in my python script to extract the data.

You could use

(?:var|let)\s+(\w+)\s*=\s*"([^"]+)"

See a demo on regex101.com .


Note: However, there are a couple of drawbacks in general to using regular expressions on code. Eg with the above, sth. like let x = -10; would not be matched but would be totally valid JavaScript code. Also, single quotes are not supported (yet) - it totally depends on your actual input.


That being said, you could go for:

(?:var|let)\s+
(?P<key>\w+)\s*=\s*
(['"])?(?(2)(?P<value1>.+?)\2|(?P<value2>[^;]+))

See another demo on regex101.com .


This still leaves you helpless against escaped quotes like let x = "some \\" string"; or against variable declarations in comments. In general, favour a parser solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM