[英]How to best flatten NDJson data in Python
I have a huge file (>400MB) of NDJson formatted data and like to flatten it into a table format for further analysis.我有一个巨大的 NDJson 格式数据文件(> 400MB),并且喜欢将其展平为表格格式以供进一步分析。
I started iterate through the various objects manually but some are rather deep and might even change over time, so I was hoping for a more general approach.我开始手动遍历各种对象,但有些对象相当深,甚至可能会随着时间的推移而改变,所以我希望有一种更通用的方法。
I was certain pandas lib would offer something but could not find anything that would help my case.我确信 pandas lib 会提供一些东西,但找不到任何对我有帮助的东西。 Also, the several other libs I found seem to not 'fully' provide what I was hoping for (flatten_json).此外,我发现的其他几个库似乎没有“完全”提供我所希望的(flatten_json)。 It all seems very early on.这一切似乎都很早。
Is it possible that there is not good (fast and easy) solve for this at this time?目前是否可能没有好的(快速和简单的)解决方案?
Any help is appreciated任何帮助表示赞赏
pandas read_json
有一个 bool param lines
,将其设置为 True 以读取 ndjsons
data_frame = pd.read_json('ndjson_file.json', lines=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.