简体   繁体   中英

How to merge several json files into one using python

I have 6 json files that I would like to merge into one. I know I need to use glob, but I am having trouble understanding how to do it. I have attached the names of the files and the code I have tried. I have also created an empty json file called 'merge.json' that I would like all the jsons to be merged into. They all have the same dictionary keys, but I would like to simply merge the files, not merge all of the values into one key. I have attached what the data looks like and what I would like it to look like when merged. Thank you!

file1 = 'file1.json'
...
file6 = 'file6.json'

file1:

{time:12, 'sizes':[1,2,3], 'scores':[80,100,77]},{time:42, 'sizes':[2,3,1], 'scores':[90,50,67]},{time:88, 'sizes':[162,124,1], 'scores':[90,100,97]}

file2:

{time:52, 'sizes':[192,242,3], 'scores':[80,100,77]},{time:482, 'sizes':[2,376,1], 'scores':[9,50,27]},{time:643, 'sizes':[93,12,90], 'scores':[10,400,97]}

...

merged:

{time:12, 'sizes':[1,2,3], 'scores':[80,100,77]},{time:42, 'sizes':[2,3,1], 'scores':[90,50,67]},{time:88, 'sizes':[162,124,1], 'scores':[90,100,97]},{time:52, 'sizes':[192,242,3], 'scores':[80,100,77]},{time:482, 'sizes':[2,376,1], 'scores':[9,50,27]},{time:643, 'sizes':[93,12,90], 'scores':[10,400,97]}

I saw on another thread to use:

import json
import glob

result = []
for f in glob.glob("*.json"):
    with open(f, "rb") as infile:
        result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

But I do not understand what goes in "*.json", and where the files are being called. Thank you!

Lets turn this into a full working program using argparse so that files can be specified on the command line. Then the decision of which directory holds the desired JSON files can be decided at run time and you can use the shell's globbing to list them.

#!/usr/bin/env python

"""Read a list of JSON files holding a list of dictionaries and merge into
a single JSON file holding a list of all of the dictionaries"""

import sys
import argparse
import json

def do_merge(infiles, outfile):
    merged = []
    for infile in infiles:
        with open(infile, 'r', encoding='utf-8') as infp:
            data = json.load(infp)
            assert isinstance(data, list), "invalid input"
            merged.extend(data)
    with open(outfile, 'w', encoding="utf-8") as outfp:
        json.dump(merged, outfp)
    return 0

def main(argv):
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('outfile', help="File to hold merged JSON")
    parser.add_argument('infiles', nargs='+', help="List of files to merge")
    args = parser.parse_args(argv)
    retval = do_merge(args.infiles, args.outfile)
    print(f"Merged {len(args.infiles)} files into {args.outfile}")
    return retval

if __name__ == "__main__":
    retval = main(sys.argv[1:])
    exit(retval)

With sample JSON files setup as

mytest/file1.json

[{"time": 12, "sizes": [1, 2, 3], "scores": [80, 100, 77]},
{"time": 42, "sizes": [2, 3, 1], "scores": [90, 50, 67]},
{"time": 88, "sizes": [162, 124, 1], "scores": [90, 100, 97]}]

mytest/file2.json

[{"time": 52, "sizes": [192, 242, 3], "scores": [80, 100, 77]},
{"time": 482, "sizes": [2, 376, 1], "scores": [9, 50, 27]},
{"time": 643, "sizes": [93, 12, 90], "scores": [10, 400, 97]}]

And the test

~/tmp$ ./jsonmerge.py mergedjson.json mytest/*.json
Merged 2 files into mergedjson.json

Put all your JSON files under one directory and run this code in the same directory

import json
import glob

result = []
for f in glob.glob("*.json"):
    with open(f, "rb") as infile:
        result.append(json.load(infile))

with open("merged_file.json", "wb") as outfile:
     json.dump(result, outfile)

This will produce a merged_file.json which will contain merged data from all JSON files.

for f in glob.glob("*.json") will iterate through every json file in that directory in the order they are present in directory.

Maybe you can try like below, Check the repl.it code -

import glob

a = glob.glob('./*.json')
print (a)

merged = open("merged.json", "w+")
for i in a:
  with open(i, "r") as f:
    for j in f.readlines():
      merged.write(j)

merged.close()

If you are intending to use the merged json as a valid json then you must structure it well. (This assumes that individual jsons are valid jsons):

Working on @tdelaney's answer:

with open("merged_file.json", "wb") as outfile:
    outfile.write("[")
    counter=1
    for f in glob.glob("*.json"):
        with open(f, "rb") as infile:
            line = None
            for line in infile:
                outfile.write(line)
            if line is not None and not line.endswith(b"\n")
                outfile.write(b"\n")
            if counter < len(glob.glob("*.json")):
                outfile.write(",")
            else:
                outfile.write("]")
            counter=counter+1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM