简体繁体中英

Decode big amount of “JSON-like” data in Python quickly

原文 2017-12-15 20:58:07 7 2 python/ json/ performance/ decode/ file-format

Say there are many (about 300,000) JSON files that take much time (about 30 minutes) to load into a list of Python objects. Profiling revealed that it is in fact not the file access but the decoding, which takes most of the time. Is there a format that I can convert these files to, which can be loaded much faster into a python list of objects?

My attempt: I converted the files to ProtoBuf (aka Google's Proto col Buf fers) but even though I got really small files (reduced to ~20% of their original size), the time to load them did not improve that dramatically (still more than 20 minutes to load them all).

2 answers

You might be looking into the wrong direction with the conversion as it will probably not cut your loading times as much as you would like. If the decoding is taking a lot of time, it will probably take quite some time from other formats as well, assuming that the JSON decoder is not really badly written. I am assuming the standard library functions have decent implementations, and JSON is not a lousy format for data storage speed-wise.

You could try running your program with PyPy instead of the default CPython implementation that I will assume you are using. PyPy could decrease the execution time tremendously. It has a faster JSON module and uses a JIT which might speed up your program a lot .

If you are using Python 3 you could also try using ProcessPoolExecutor to run the file loading and data deserialization / decoding concurrently. You will have to experiment with the degree of concurrency, but a good starting point is the number of your CPU cores, which you can halve or double. If your program waits for I/O a lot, you should run a higher degree of concurrency, if the degree of I/O is smaller you can try and reduce the concurrency. If you write each executor so that they load the data into Python objects and simply return them, you should be able to cut your loading times significantly. Note that you must use a process-driven approach, using threads will not work with the GIL .

You could also use a faster JSON library which could speed up your execution times two or three-fold in an optimal case. In a real-world use case the speed up will probably be smaller. Do note that these might not work with PyPy since it uses an alternative CFFI implementation and will not work with CPython programs, and PyPy has a good JSON module anyway.

Try ujson , it's quite a bit faster.

"Decoding takes most of the time" can be seen as "building the Python objects takes all the time". Do you really need all these things as Python objects in RAM all the time? It must be quite a lot.

I'd consider using a proper database for eg querying data of such size.

If you need mass processing of a different kind, eg stats or matrix processing, I'd take a look at pandas .

JSON-like object in Python

Convert python nested JSON-like data to dataframe

Parsing unknown data type object Json-like using Python

usnparser json-like data to pandas dataframe

Python: parsing JSON-like Javascript data structures (w/ consecutive commas)

how to transform 'JSON-like' string into real JSON data

Python module to process and clean json-like files

Is there a better way to access deeply nested JSON-like objects in Python?

python: turn nested dict into JSON-like format

Get list without JSON-like style mysql python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question JSON-like object in Python Convert python nested JSON-like data to dataframe Parsing unknown data type object Json-like using Python usnparser json-like data to pandas dataframe Python: parsing JSON-like Javascript data structures (w/ consecutive commas) how to transform 'JSON-like' string into real JSON data Python module to process and clean json-like files Is there a better way to access deeply nested JSON-like objects in Python? python: turn nested dict into JSON-like format Get list without JSON-like style mysql python

Related Tags

Decode big amount of “JSON-like” data in Python quickly

Question

2 answers

solution1
2 ACCPTED 2017-12-15 21:13:51

solution2
0 2017-12-15 21:31:53

Decode big amount of “JSON-like” data in Python quickly

Question

2 answers

solution1 2 ACCPTED 2017-12-15 21:13:51

solution2 0 2017-12-15 21:31:53

solution1
2 ACCPTED 2017-12-15 21:13:51

solution2
0 2017-12-15 21:31:53