简体   繁体   English

如何使用 python 将多个 json 文件合并为一个

[英]How to merge several json files into one using python

I have 6 json files that I would like to merge into one.我有 6 个 json 文件,我想合并为一个。 I know I need to use glob, but I am having trouble understanding how to do it.我知道我需要使用 glob,但我无法理解如何使用它。 I have attached the names of the files and the code I have tried.我附上了文件的名称和我尝试过的代码。 I have also created an empty json file called 'merge.json' that I would like all the jsons to be merged into.我还创建了一个名为“merge.json”的空 json 文件,我希望将所有 json 文件合并到该文件中。 They all have the same dictionary keys, but I would like to simply merge the files, not merge all of the values into one key.它们都有相同的字典键,但我想简单地合并文件,而不是将所有值合并到一个键中。 I have attached what the data looks like and what I would like it to look like when merged.我附上了数据的样子以及我希望它在合并时的样子。 Thank you!谢谢!

file1 = 'file1.json'
...
file6 = 'file6.json'

file1:文件1:

{time:12, 'sizes':[1,2,3], 'scores':[80,100,77]},{time:42, 'sizes':[2,3,1], 'scores':[90,50,67]},{time:88, 'sizes':[162,124,1], 'scores':[90,100,97]}

file2:文件2:

{time:52, 'sizes':[192,242,3], 'scores':[80,100,77]},{time:482, 'sizes':[2,376,1], 'scores':[9,50,27]},{time:643, 'sizes':[93,12,90], 'scores':[10,400,97]}

... ...

merged:合并:

{time:12, 'sizes':[1,2,3], 'scores':[80,100,77]},{time:42, 'sizes':[2,3,1], 'scores':[90,50,67]},{time:88, 'sizes':[162,124,1], 'scores':[90,100,97]},{time:52, 'sizes':[192,242,3], 'scores':[80,100,77]},{time:482, 'sizes':[2,376,1], 'scores':[9,50,27]},{time:643, 'sizes':[93,12,90], 'scores':[10,400,97]}

I saw on another thread to use:我在另一个线程上看到要使用:

import json
import glob

result = []
for f in glob.glob("*.json"):
    with open(f, "rb") as infile:
        result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

But I do not understand what goes in "*.json", and where the files are being called.但我不明白“*.json”中的内容以及文件被调用的位置。 Thank you!谢谢!

Lets turn this into a full working program using argparse so that files can be specified on the command line.让我们使用 argparse 将其变成一个完整的工作程序,以便可以在命令行上指定文件。 Then the decision of which directory holds the desired JSON files can be decided at run time and you can use the shell's globbing to list them.然后可以在运行时决定哪个目录保存所需的 JSON 文件,您可以使用 shell 的 globbing 列出它们。

#!/usr/bin/env python

"""Read a list of JSON files holding a list of dictionaries and merge into
a single JSON file holding a list of all of the dictionaries"""

import sys
import argparse
import json

def do_merge(infiles, outfile):
    merged = []
    for infile in infiles:
        with open(infile, 'r', encoding='utf-8') as infp:
            data = json.load(infp)
            assert isinstance(data, list), "invalid input"
            merged.extend(data)
    with open(outfile, 'w', encoding="utf-8") as outfp:
        json.dump(merged, outfp)
    return 0

def main(argv):
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('outfile', help="File to hold merged JSON")
    parser.add_argument('infiles', nargs='+', help="List of files to merge")
    args = parser.parse_args(argv)
    retval = do_merge(args.infiles, args.outfile)
    print(f"Merged {len(args.infiles)} files into {args.outfile}")
    return retval

if __name__ == "__main__":
    retval = main(sys.argv[1:])
    exit(retval)

With sample JSON files setup as样品 JSON 文件设置为

mytest/file1.json mytest/file1.json

[{"time": 12, "sizes": [1, 2, 3], "scores": [80, 100, 77]},
{"time": 42, "sizes": [2, 3, 1], "scores": [90, 50, 67]},
{"time": 88, "sizes": [162, 124, 1], "scores": [90, 100, 97]}]

mytest/file2.json mytest/file2.json

[{"time": 52, "sizes": [192, 242, 3], "scores": [80, 100, 77]},
{"time": 482, "sizes": [2, 376, 1], "scores": [9, 50, 27]},
{"time": 643, "sizes": [93, 12, 90], "scores": [10, 400, 97]}]

And the test和测试

~/tmp$ ./jsonmerge.py mergedjson.json mytest/*.json
Merged 2 files into mergedjson.json

Put all your JSON files under one directory and run this code in the same directory将所有 JSON 文件放在一个目录下,并在同一目录下运行此代码

import json
import glob

result = []
for f in glob.glob("*.json"):
    with open(f, "rb") as infile:
        result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

This will produce a merged_file.json which will contain merged data from all JSON files.这将生成一个merged_file.json ,其中将包含来自所有 JSON 文件的合并数据。

for f in glob.glob("*.json") will iterate through every json file in that directory in the order they are present in directory. for f in glob.glob("*.json")将按照目录中存在的顺序遍历该目录中的每个 json 文件。

Maybe you can try like below, Check the repl.it code -也许你可以像下面这样尝试,检查 repl.it代码-

import glob

a = glob.glob('./*.json')
print (a)

merged = open("merged.json", "w+")
for i in a:
  with open(i, "r") as f:
    for j in f.readlines():
      merged.write(j)

merged.close()

If you are intending to use the merged json as a valid json then you must structure it well.如果您打算使用合并的 json 作为有效的 json,那么您必须将其结构良好。 (This assumes that individual jsons are valid jsons): (这假设单个 json 是有效的 json):

Working on @tdelaney's answer:处理@tdelaney 的答案:

with open("merged_file.json", "wb") as outfile:
    outfile.write("[")
    counter=1
    for f in glob.glob("*.json"):
        with open(f, "rb") as infile:
            line = None
            for line in infile:
                outfile.write(line)
            if line is not None and not line.endswith(b"\n")
                outfile.write(b"\n")
            if counter < len(glob.glob("*.json")):
                outfile.write(",")
            else:
                outfile.write("]")
            counter=counter+1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM