如何使用Python和Beautifulsoup从脚本标签获取JavaScript变量

Question

I want to return the "id" value from the variable meta using beautifulsoup and python. 我想使用beautifulsoup和python从变量meta返回“ id”值。 This possible? 这可能吗？ Additionally, I don't know how to find the certain 'script' tag that contains the meta variable because it does not have a unique identifier, as well as many other 'script' tags on the site. 另外，我不知道如何找到包含meta变量的某些“脚本”标签，因为它没有唯一的标识符，以及网站上的许多其他“脚本”标签。 I'm also using selenium as well, so I can understand any answers with that. 我也使用硒，所以我可以理解任何答案。

<script>
    var meta = "variants":[{"id":12443604615241,"price":14000}, 
    {"id":12443604648009,"price":14000}]
</script>

Answer 1

If you are using selenium there's no need to parse the html to get the js variable, just use selenum webdriver.execute_script() to get it to python: 如果您使用的是硒，则无需解析html以获取js变量，只需使用selenum webdriver.execute_script()即可将其获取到python：

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://whatever.com/')
meta = driver.execute_script('return meta')

And thats it, meta now holds the js variable, and it maintains its type 就是这样，meta现在拥有js变量，并且保持其类型

Answer 2

You can use builtin re and json module for extracting Javascript variables: 您可以使用内置的re和json模块提取Javascript变量：

from bs4 import BeautifulSoup
import re
import json
from pprint import pprint

data = '''
<html>
<body>

<script>
    var meta = "variants":[{"id":12443604615241,"price":14000},
    {"id":12443604648009,"price":14000}]
</script>

</body>
'''

soup = BeautifulSoup(data, 'lxml')
json_string = re.search(r'meta\s*=\s*(.*?}])\s*\n', str(soup.find('script')), flags=re.DOTALL)

json_data = json.loads('{' + json_string[1] + '}')

pprint(json_data)

This prints: 打印：

{'variants': [{'id': 12443604615241, 'price': 14000},
              {'id': 12443604648009, 'price': 14000}]}

如何使用Python和Beautifulsoup从脚本标签获取JavaScript变量

问题描述

2 个解决方案

解决方案1
1 已采纳 2018-08-10 02:30:41

解决方案2
1 2018-08-10 06:30:15

如何使用Python和Beautifulsoup从脚本标签获取JavaScript变量

问题描述

2 个解决方案

解决方案1 1 已采纳 2018-08-10 02:30:41

解决方案2 1 2018-08-10 06:30:15

解决方案1
1 已采纳 2018-08-10 02:30:41

解决方案2
1 2018-08-10 06:30:15