简体   繁体   English

如何使用python bs4提取javascript变量

[英]how to extract javascript variables by using python bs4

<script type="text/javascript">var csrfMagicToken = "sid:bf8be784734837a64a47fcc30b9df99,162591180";var csrfMagicName = "__csrf_magic";</script>

The above script tag is from a webpage.上面的脚本标签来自一个网页。

script = soup.find_all('script')[5]

By using the above line of code I was able to extract the script tag which I want but I need to extract the value of variables in a python script,I am using BeautifulSoup in my python script to extract the data.通过使用上面的代码行,我能够提取我想要的脚本标签,但我需要在 python 脚本中提取变量的值,我在 python 脚本中使用 BeautifulSoup 来提取数据。

You could use可以

(?:var|let)\s+(\w+)\s*=\s*"([^"]+)"

See a demo on regex101.com .在 regex101.com 上查看演示


Note: However, there are a couple of drawbacks in general to using regular expressions on code.注意:但是,在代码上使用正则表达式通常有几个缺点。 Eg with the above, sth.例如与上述,......。 like let x = -10;比如let x = -10; would not be matched but would be totally valid JavaScript code.不会匹配,但将是完全有效的JavaScript代码。 Also, single quotes are not supported (yet) - it totally depends on your actual input.此外,(尚)不支持单引号 - 这完全取决于您的实际输入。


That being said, you could go for:话虽如此,你可以去:

(?:var|let)\s+
(?P<key>\w+)\s*=\s*
(['"])?(?(2)(?P<value1>.+?)\2|(?P<value2>[^;]+))

See another demo on regex101.com .在 regex101.com 上查看另一个演示


This still leaves you helpless against escaped quotes like let x = "some \\" string"; or against variable declarations in comments. In general, favour a parser solution.这仍然让您对转义引号(如let x = "some \\" string";或注释中的变量声明)无能为力。通常,支持解析器解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM