简体   繁体   English

如何最好地在 Python 中展平 NDJson 数据

[英]How to best flatten NDJson data in Python

I have a huge file (>400MB) of NDJson formatted data and like to flatten it into a table format for further analysis.我有一个巨大的 NDJson 格式数据文件(> 400MB),并且喜欢将其展平为表格格式以供进一步分析。

I started iterate through the various objects manually but some are rather deep and might even change over time, so I was hoping for a more general approach.我开始手动遍历各种对象,但有些对象相当深,甚至可能会随着时间的推移而改变,所以我希望有一种更通用的方法。

I was certain pandas lib would offer something but could not find anything that would help my case.我确信 pandas lib 会提供一些东西,但找不到任何对我有帮助的东西。 Also, the several other libs I found seem to not 'fully' provide what I was hoping for (flatten_json).此外,我发现的其他几个库似乎没有“完全”提供我所希望的(flatten_json)。 It all seems very early on.这一切似乎都很早。

Is it possible that there is not good (fast and easy) solve for this at this time?目前是否可能没有好的(快速和简单的)解决方案?

Any help is appreciated任何帮助表示赞赏

pandas read_json有一个 bool param lines ,将其设置为 True 以读取 ndjsons

data_frame = pd.read_json('ndjson_file.json', lines=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM