简体   繁体   English

在ruby中解析巨大的JSON文件的最佳方法

[英]Best way to parse a huge JSON file in ruby

I'm having a hard time parsing a huge json file. 我很难解析一个巨大的json文件。

The file is >1GB, and I've tried using the two gems: ruby-stream and yajl, and they both don't work. 该文件大于1GB,我尝试使用这两个宝石:ruby-stream和yajl,它们都不起作用。

Here's an example of what happens. 这是一个发生了什么的例子。

fileStr = File.read("hugeJSONfile.json")

^ This part is OK. ^这部分没问题。

But when I try to load the fileStr into a JSON hash (via ruby-stream or yajl), my computer freezes. 但是当我尝试将fileStr加载到JSON哈希(通过ruby-stream或yajl)时,我的计算机冻结了。

Any other ideas on how to do this more efficiently? 关于如何更有效地做到这一点的任何其他想法? Thank you. 谢谢。

Take a look into the json-stream or yajl : 看看json-streamyajl

Key quote from the docs: 来自文档的关键报价:

json-stream: JSON流:

the document itself is never fully read into memory. 文档本身永远不会完全读入内存。

yajl: yajl:

The main benefit of this library is in its memory usage. 该库的主要优点在于其内存使用情况。 Since it's able to parse the stream in chunks, its memory requirements are very, very low. 由于它能够以块的形式解析流,因此其内存要求非常非常低。

You register events you are looking for, and it returns keys/values while reading through the JSON instead of loading it all into a ruby data structure (and consequently into memory). 您可以注册要查找的事件,并在读取JSON时返回键/值,而不是将其全部加载到ruby数据结构中(因此也会加载到内存中)。

Okay, I was able to parse it. 好的,我能够解析它。

Honestly, this is not the most elegant solution, but given desperate times, one quick way to parse a huge JSON file is to examine the file manually, notice a pattern, and pluck out what you need. 老实说,这不是最优雅的解决方案,但是在绝望的时候,解析一个巨大的JSON文件的一种快速方法是手动检查文件,注意一个模式,然后选择你需要的东西。

In my case, here's what I did in pseudo code 在我的例子中,这是我在伪代码中所做的

fileStr = File.read("hugeJSONfile.json")
arr = fileStr.split("[some pattern]")
arr.each do |str|
    extract desired value from str
end

Again, not the most elegant solution, but it's low maintenance, and depending on the given circumstances, just adapt to what your crappy laptop can muster. 同样,不是最优雅的解决方案,但它的维护成本低,并且根据具体情况,只需适应您的笔记本电脑可以集合的内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM