[英]how to extract javascript variables by using python bs4
<script type="text/javascript">var csrfMagicToken = "sid:bf8be784734837a64a47fcc30b9df99,162591180";var csrfMagicName = "__csrf_magic";</script>
The above script tag is from a webpage.上面的脚本标签来自一个网页。
script = soup.find_all('script')[5]
By using the above line of code I was able to extract the script tag which I want but I need to extract the value of variables in a python script,I am using BeautifulSoup in my python script to extract the data.通过使用上面的代码行,我能够提取我想要的脚本标签,但我需要在 python 脚本中提取变量的值,我在 python 脚本中使用 BeautifulSoup 来提取数据。
You could use你可以用
(?:var|let)\s+(\w+)\s*=\s*"([^"]+)"
See a demo on regex101.com .在 regex101.com 上查看演示。
Note: However, there are a couple of drawbacks in general to using regular expressions on code.注意:但是,在代码上使用正则表达式通常有几个缺点。 Eg with the above, sth.例如与上述,......。 like let x = -10;
比如let x = -10;
would not be matched but would be totally valid JavaScript
code.不会匹配,但将是完全有效的JavaScript
代码。 Also, single quotes are not supported (yet) - it totally depends on your actual input.此外,(尚)不支持单引号 - 这完全取决于您的实际输入。
That being said, you could go for:话虽如此,你可以去:
(?:var|let)\s+
(?P<key>\w+)\s*=\s*
(['"])?(?(2)(?P<value1>.+?)\2|(?P<value2>[^;]+))
See another demo on regex101.com .在 regex101.com 上查看另一个演示。
This still leaves you helpless against escaped quotes like let x = "some \\" string";
or against variable declarations in comments. In general, favour a parser solution.这仍然让您对转义引号(如let x = "some \\" string";
或注释中的变量声明)无能为力。通常,支持解析器解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.