简体   繁体   中英

How to convert a large Json file into a csv using python

(Python 3.5) I am trying to parse a large user review.json file (1.3gb) into python and convert to a .csv file. I have tried looking for a simple converter tool online, most of which accept a file size maximum of 1Mb or are super expensive. as i am fairly new to python i guess i ask 2 questions.

  1. is it even possible/ efficient to do so or should i be looking for another method?

  2. I tried the following code, it only is reading the and writing the top 342 lines in my .json doc then returning an error.

Blockquote File "C:\\Anaconda3\\lib\\json__init__.py", line 319, in loads return _default_decoder.decode(s)

File "C:\\Anaconda3\\lib\\json\\decoder.py", line 342, in decode raise JSONDecodeError("Extra data", s, end) JSONDecodeError: Extra data

This is the code im using

import csv
import json

infile = open("myfile.json","r")
outfile = open ("myfile.csv","w")

writer = csv.writer(outfile)

for row in json.loads(infile.read()):
  writer.writerow(row)

my .json example:

Link To small part of Json

My thoughts is its some type of error related to my for loop, with json.loads... but i do not know enough about it. Is it possible to create a dictionary{} and take convert just the values "user_id", "stars", "text"? or am i dreaming.

Any suggestions or criticism are appreciated.

This is not a JSON file; this is a file containing individual lines of JSON. You should parse each line individually.

for row in infile:
  data = json.loads(row)
  writer.writerow(data)

Sometimes it's not as easy as having one JSON definition per line of input. A JSON definition can spread out over multiple lines, and it's not necessarily easy to determine which are the start and end braces reading line by line (for example, if there are strings containing braces, or nested structures).

The answer is to use the raw_decode method of json.JSONDecoder to fetch the JSON definitions from the file one at a time. This will work for any set of concatenated valid JSON definitions. It's further described in my answer here: Importing wrongly concatenated JSONs in python

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM