简体   繁体   中英

Best way to parse a huge JSON file in ruby

I'm having a hard time parsing a huge json file.

The file is >1GB, and I've tried using the two gems: ruby-stream and yajl, and they both don't work.

Here's an example of what happens.

fileStr = File.read("hugeJSONfile.json")

^ This part is OK.

But when I try to load the fileStr into a JSON hash (via ruby-stream or yajl), my computer freezes.

Any other ideas on how to do this more efficiently? Thank you.

Take a look into the json-stream or yajl :

Key quote from the docs:

json-stream:

the document itself is never fully read into memory.

yajl:

The main benefit of this library is in its memory usage. Since it's able to parse the stream in chunks, its memory requirements are very, very low.

You register events you are looking for, and it returns keys/values while reading through the JSON instead of loading it all into a ruby data structure (and consequently into memory).

Okay, I was able to parse it.

Honestly, this is not the most elegant solution, but given desperate times, one quick way to parse a huge JSON file is to examine the file manually, notice a pattern, and pluck out what you need.

In my case, here's what I did in pseudo code

fileStr = File.read("hugeJSONfile.json")
arr = fileStr.split("[some pattern]")
arr.each do |str|
    extract desired value from str
end

Again, not the most elegant solution, but it's low maintenance, and depending on the given circumstances, just adapt to what your crappy laptop can muster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM