简体   繁体   English

使用BeautifulSoup从网页上抓取javascript / json对象?

[英]Scraping a javascript / json object from a webpage using BeautifulSoup?

I am using BeautifulSoup to get the HTML of a webpage. 我正在使用BeautifulSoup来获取网页的HTML。 That works fine so far. 到目前为止,一切正常。 But what I really want are the contents of this javascript chunk inside the HTML, which is encapsulated with <script type="text/javascript"> and then inside that tag, eventually there is a giant array thing that has a lot of {} brackets, and I believe this is a JSON array? 但是我真正想要的是HTML内的这个JavaScript块的内容,该内容用<script type="text/javascript">封装,然后在该标记内,最终有一个包含很多{}的巨型数组中括号,我相信这是一个JSON数组?

Is there a way I can try to extract that entire array from within the HTML? 有没有办法我可以尝试从HTML中提取整个数组?

You are looking for the function json.loads . 您正在寻找json.loads函数。

>>> import json
>>> obj = json.loads('{"a": 12, "b": null}')
>>> obj
{'b': None, 'a': 12}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM