[英]How to convert raw javascript object to python dictionary?
When screen-scraping some website, I extract data from <script>
tags.当屏幕抓取某些网站时,我从<script>
标签中提取数据。
The data I get is not in standard JSON
format.我得到的数据不是标准的JSON
格式。 I cannot use json.loads()
.我不能使用json.loads()
。
# from
js_obj = '{x:1, y:2, z:3}'
# to
py_obj = {'x':1, 'y':2, 'z':3}
Currently, I use regex
to transform the raw data to JSON
format.目前,我使用regex
将原始数据转换为JSON
格式。
But I feel pretty bad when I encounter complicated data structure.但是当我遇到复杂的数据结构时,我感觉很糟糕。
Do you have some better solutions?你有更好的解决方案吗?
demjson.decode()<\/code><\/a><\/h2>
import demjson # from js_obj = '{x:1, y:2, z:3}' # to py_obj = demjson.decode(js_obj)<\/code><\/pre>
jsonnet.evaluate_snippet()<\/code><\/a><\/h2>
import json, _jsonnet # from js_obj = '{x:1, y:2, z:3}' # to py_obj = json.loads(_jsonnet.evaluate_snippet('snippet', js_obj))<\/code><\/pre>
ast.literal_eval()<\/code><\/a><\/h2> import ast # from js_obj = "{'x':1, 'y':2, 'z':3}" # to py_obj = ast.literal_eval(js_obj)<\/code><\/pre>"
I'm facing the same problem this afternoon, and I finally found a quite good solution.今天下午我面临同样的问题,我终于找到了一个很好的解决方案。 That is JSON5 .那是JSON5 。
The syntax of JSON5 is more similar to native JavaScript, so it can help you parse non-standard JSON objects. JSON5 的语法更类似于原生 JavaScript,因此可以帮助您解析非标准的 JSON 对象。
This will likely not work everywhere, but as a start, here's a simple regex that should convert the keys into quoted strings so you can pass into json.loads.这可能不适用于任何地方,但首先,这是一个简单的正则表达式,它应该将键转换为带引号的字符串,以便您可以传递到 json.loads。 Or is this what you're already doing?或者这就是你已经在做的?
In[70] : quote_keys_regex = r'([\{\s,])(\w+)(:)'
In[71] : re.sub(quote_keys_regex, r'\1"\2"\3', js_obj)
Out[71]: '{"x":1, "y":2, "z":3}'
In[72] : js_obj_2 = '{x:1, y:2, z:{k:3,j:2}}'
Int[73]: re.sub(quote_keys_regex, r'\1"\2"\3', js_obj_2)
Out[73]: '{"x":1, "y":2, "z":{"k":3,"j":2}}'
If you have node<\/code> available on the system, you can ask it to evaluate the javascript expression for you, and print the stringified result.
如果系统上有可用的
node<\/code> ,您可以要求它为您评估 javascript 表达式,并打印字符串化结果。
The resulting JSON can then be fed to
json.loads<\/code> :
然后可以将生成的 JSON 馈送到
json.loads<\/code> :
def evaluate_javascript(s):
"""Evaluate and stringify a javascript expression in node.js, and convert the
resulting JSON to a Python object"""
node = Popen(['node', '-'], stdin=PIPE, stdout=PIPE)
stdout, _ = node.communicate(f'console.log(JSON.stringify({s}))'.encode('utf8'))
return json.loads(stdout.decode('utf8'))
Not including objects<\/strong>不包括对象<\/strong>
json.loads()<\/code> doesn't accept undefined<\/strong> , you have to change to null<\/strong>
json.loads()<\/code>不接受undefined<\/strong> ,你必须改为null<\/strong>
<\/li>
json.loads()<\/code> only<\/strong> accept double quotes
json.loads()<\/code>只<\/strong>接受双引号
{"foo": 1, "bar": null}<\/code><\/li><\/ul><\/li><\/ul> Use this if you are sure that your javascript code only have double quotes on key names.如果您确定您的 javascript 代码在键名上只有双引号,请使用此选项。
import json json_text = """{"foo": 1, "bar": undefined}""" json_text = re.sub(r'("\\s*:\\s*)undefined(\\s*[,}])', '\\\\1null\\\\2', json_text) py_obj = json.loads(json_text)<\/code><\/pre>
ast.literal_eval()<\/a> ast.literal_eval()<\/a><\/h1>
-
ast.literal_eval()<\/code> doesn't accept undefined<\/strong> , you have to change to None<\/strong>
ast.literal_eval()<\/code>不接受undefined<\/strong> ,您必须更改为None<\/strong>
<\/li>
ast.literal_eval()<\/code> doesn't accept null<\/strong> , you have to change to None<\/strong>
ast.literal_eval()<\/code>不接受null<\/strong> ,您必须更改为None<\/strong>
<\/li>
ast.literal_eval()<\/code> doesn't accept true<\/strong> , you have to change to True<\/strong>
ast.literal_eval()<\/code>不接受true<\/strong> ,您必须更改为True<\/strong>
<\/li>
ast.literal_eval()<\/code> doesn't accept false<\/strong> , you have to change to False<\/strong>
ast.literal_eval()<\/code>不接受false<\/strong> ,您必须更改为False<\/strong>
<\/li>
ast.literal_eval()<\/code> accept single and double quotes
ast.literal_eval()<\/code>接受单引号和双引号
{"foo": 1, "bar": None}<\/code> or {'foo': 1, 'bar': None}<\/code>
{"foo": 1, "bar": None}<\/code>或{'foo': 1, 'bar': None}<\/code>
<\/li><\/ul><\/li><\/ul> import ast js_obj = """{'foo': 1, 'bar': undefined}""" js_obj = re.sub(r'([\\'\\"]\\s*:\\s*)undefined(\\s*[,}])', '\\\\1None\\\\2', js_obj) js_obj = re.sub(r'([\\'\\"]\\s*:\\s*)null(\\s*[,}])', '\\\\1None\\\\2', js_obj) js_obj = re.sub(r'([\\'\\"]\\s*:\\s*)NaN(\\s*[,}])', '\\\\1None\\\\2', js_obj) js_obj = re.sub(r'([\\'\\"]\\s*:\\s*)true(\\s*[,}])', '\\\\1True\\\\2', js_obj) js_obj = re.sub(r'([\\'\\"]\\s*:\\s*)false(\\s*[,}])', '\\\\1False\\\\2', js_obj) py_obj = ast.literal_eval(js_obj)<\/code><\/pre>"
Simply: 只是:
import json
py_obj = json.loads(js_obj_stringified)
Above is the Python portion of the code. 上面是代码的Python部分。 In javascript portion of the code: 在代码的javascript部分:
js_obj_stringified = JSON.stringify(data);
JSON.stringify turns a Javascript object into JSON text and stores that JSON text in a string. JSON.stringify将Javascript对象转换为JSON文本,并将该JSON文本存储在字符串中。 It is a safe way to pass (via POST/GET) a javascript object to python to process. 这是将JavaScript对象(通过POST / GET)传递给python进行处理的安全方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.