如何读取 json 元数据文件的前 100 行并将它们写入较小的 json 文件？ [Python]

Question

I have a json metadata file with around 26 GB of data.我有一个包含大约 26 GB 数据的 json 元数据文件。 For obvious reasons I need to extract the first 100 lines to create a new json file to read, so that I have less alteration possible on the code that follows, which should be for testing on the 100 lines and once debug is completed apply the code on the whole file.出于显而易见的原因，我需要提取前 100 行来创建一个新的 json 文件来读取，这样我对接下来的代码的改动就更少了，这应该是为了在 100 行上进行测试，一旦调试完成就应用代码在整个文件上。

I have read over exporting json to csv but I wish to maintain the json structure and file type, is it possible to do so with Python?我已经阅读了将 json 导出到 csv 的内容，但我希望保持 json 结构和文件类型，是否可以使用 Python 这样做？

My file is a json with some extra data, so I use a work around to read it in the first place.我的文件是一个带有一些额外数据的 json，所以我首先使用变通方法来读取它。 It looks lik this:它看起来像这样：


{"_id":{"$oid":"5b9fd47507b317551a7bfb8f"},"title":"It's Okay If You Didn't Like 'Boyhood', And Here Are Many Reasons Why","url":"https://m.huffpost.com/us/entry/6694772","article_text"

And I read it like this我是这样读的

with open('metadata.json', 'r') as data:
    data = json.loads("[" + data.read().replace("}\n{", "},\n{") + "]")

Thanks!谢谢！

Answer 1

You can try:你可以试试：

import json
with open('file.json') as ip_file:
  o = json.load(ip_file)
  chunkSize = 100
  for i in range(0, len(o), chunkSize):
    with open('new_file' + '.json', 'a') as out_file:
      json.dump(o[i:i+chunkSize], out_file)

如何读取 json 元数据文件的前 100 行并将它们写入较小的 json 文件？ [Python]

问题描述

1 个解决方案

解决方案1
0 2019-12-04 12:45:05

如何读取 json 元数据文件的前 100 行并将它们写入较小的 json 文件？ [Python]

问题描述

1 个解决方案

解决方案1 0 2019-12-04 12:45:05

解决方案1
0 2019-12-04 12:45:05