[英]How to parse invalid JSON using Lark?
Let's start by considering a simple json parser using lark: 让我们从考虑使用云雀的简单json解析器开始:
import sys
from lark import Lark, Transformer, v_args
json_grammar = r"""
?start: value
?value: object
| array
| string
| SIGNED_NUMBER -> number
| "true" -> true
| "false" -> false
| "null" -> null
array : "[" [value ("," value)*] "]"
object : "{" [pair ("," pair)*] "}"
pair : string ":" value
string : ESCAPED_STRING
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.WS
%ignore WS
"""
class TreeToJson(Transformer):
@v_args(inline=True)
def string(self, s):
return s[1:-1].replace('\\"', '"')
array = list
pair = tuple
object = dict
number = v_args(inline=True)(float)
def null(self, _): return None
def true(self, _): return True
def false(self, _): return False
if __name__ == '__main__':
json_parser = Lark(json_grammar, parser='lalr', lexer='standard', transformer=TreeToJson())
parse = json_parser.parse
dct = parse('''
{
"empty_object" : {},
"empty_array" : [],
"booleans" : { "YES" : true, "NO" : false },
"numbers" : [ 0, 1, -2, 3.3, 4.4e5, 6.6e-7 ],
"strings" : [ "This", [ "And" , "That", "And a \\"b" ] ],
"nothing" : null
}
''')
print(dct)
The above example is taken from the official examples website and it's able to parse valid json. 上面的示例取自官方示例网站,它能够解析有效的json。
So far so good but my question would be how I could extend this grammar & transformer so it will also be able to parse invalid json strings such as the below one: 到目前为止,还不错,但我的问题是如何扩展此语法和转换器,以便它也能够解析无效的json字符串,例如以下内容:
dct = parse('''
[
// Item1
{ "key1": "value1" },
// Item2
{ "key2": "value2", "key3": ["a","b",] },
// Item3
{ "key4": [{"key5":"value5"},] },
]
''')
My main goal is to be able of parsing SublimeText assets (which are a superset of json), ST uses sublime_api.decode_value
behind the curtains... but this function is closed source so I can't use it. 我的主要目标是能够解析SublimeText资产(它们是json的超集),ST在幕后使用了sublime_api.decode_value
...但是此函数是封闭源代码,所以我不能使用它。 Also I didn't find any pypi library which works out of the box for this type of data so I decided my best chance would trying to write my own custom "invalid json" parser. 另外,我没有找到任何可直接用于此类数据的pypi库,因此我决定最好的机会是尝试编写自己的自定义“无效json”解析器。
The demjson
library is very good at parsing questionable json: demjson
库非常擅长解析可疑的json:
import demjson
str = '''
[
// Item1
{ "key1": "value1" },
// Item2
{ "key2": "value2", "key3": ["a","b",] },
// Item3
{ "key4": [{"key5":"value5"},] },
]
'''
print(demjson.decode(str))
Result: 结果:
[{'key1': 'value1'}, {'key2': 'value2', 'key3': ['a', 'b']}, {'key4': [{'key5': 'value5'}]}]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.