简体   繁体   中英

Optimal way to parse Twitter JSON objects from one file/multiple files into python

I have the Twitter dataset (multiple JSON files), but let's start from one file. I have to parse JSON objects to python but json.loads() only parse one object. A similar question is asked here but solutions are not working or good enough.

1- I can not convert JSON objects into the list as it is not efficient and I have too much data. Also proposed solutions are based on "\n" while my Twitter data objects end like }{ there is no newline and I can not add manually. (Twitter objects are also not line by line)

2- The second solution is JSONStream and there is not much available about it on official documentation .

3- Is there any other efficient way? One I have in consideration is using MongoDB . but I never worked on MongoDB . so I don't know if this is possible with this or not.

below picture shows the length of tweet object and }{

在此处输入图像描述

with open('sampledata.json','r',encoding='utf8') as json_file:
    #for i in json_file:
     while(True):
        dataobj = json.load(json_file)
        print(dataobj)
print("Printing each JSON Decoded Object")

Error: As there are 287 lines for one object.

raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 287 column 2 (char 10528)

The while loop used while reading the json file is not needed You can use this to read a json file:

def read_json(path):
    with open(path, 'r') as file:
        return json.load(file)

my_data = read_json('sampledata.json')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM