通过格式不正确的JSON文件python进行解析

Question

I was given a dataset that I need to do some data analytics on it. 给我一个数据集，我需要对其进行一些数据分析。 Each dataset that was given to me is in a json file. 给我的每个数据集都在一个json文件中。 The problem that I am having is that I noticed that each json object is not separated by a ',' so I can't just do a simple json dump into a variable. 我遇到的问题是，我注意到每个json对象都没有用'分隔，所以我不能只是将简单的json转储到变量中。 And I cannot just add in a ',' in between each object because each file has over 100 json objects and there is about 100 files, so it would take a long time to do that. 而且我不能仅在每个对象之间添加'，'，因为每个文件有100多个json对象，并且大约有100个文件，因此这样做会花费很长时间。 So I was wondering what I could do to fix this issue that I am having. 所以我想知道如何解决这个问题。

Answer 1

Since you aren't providing an example of your data, you could extend your JSONDecoder adding a class like: 由于您没有提供数据示例，因此可以扩展JSONDecoder添加类似以下的类：

import json

class ComplexEncoder(json.JSONDecoder):
    def decode(self, obj):
        obj = obj.replace(" ", ", ")
        print(obj)
        return json.JSONDecoder.decode(self, obj)

a = json.loads('{"a":1 "b":2}', cls=ComplexEncoder)
print(a)
# {'a': 1, 'b': 2}

Basically just replace that space with a comma, if you have spaces between the : and the value, make a regex that don't replace that. 基本上只是用逗号替换该空格，如果您在:和值之间有空格，请创建一个不替换该正则表达式的正则表达式。

I think you're refering to json.loads() instead of json.dumps 我认为您指的是json.loads()而不是json.dumps

Answer 2

You could try using littletable , which will import files containing consecutive, undelimited (even multiline) JSON objects. 您可以尝试使用littletable ，它将导入包含连续，无界（甚至多行）JSON对象的文件。

import littletable as lt

data = """
{"a": 100, "b": 200, "c": 300}
{"a": 101, "b": 201, "c": 301}
{
    "a": 102, 
    "b": 202, 
    "c": 302

}
"""

json_table = lt.Table()
# for this post we import from the data using a Python string;
# in your program, just do json_table.json_import('data_file.json')
json_table.json_import(data)
for row in json_table:
    print(row.a, row.b, row.c)

Prints: 打印：

100 200 300
101 201 301
102 202 302

Once it is imported, you could re-export it as a CSV, or just use the table like a normal Python list and serialize it any way you like. 导入后，您可以将其重新导出为CSV，也可以像正常的Python列表一样使用该表，并以任意方式对其进行序列化。

Disclosure: I am the author of littletable 披露：我是littletable的作者

通过格式不正确的JSON文件python进行解析

问题描述

2 个解决方案

解决方案1
0 2018-10-24 03:25:41

解决方案2
0 已采纳 2018-10-24 11:23:05

通过格式不正确的JSON文件python进行解析

问题描述

2 个解决方案

解决方案1 0 2018-10-24 03:25:41

解决方案2 0 已采纳 2018-10-24 11:23:05

解决方案1
0 2018-10-24 03:25:41

解决方案2
0 已采纳 2018-10-24 11:23:05