[英]How to merge several json files into one using python
我有 6 个 json 文件,我想合并为一个。 我知道我需要使用 glob,但我无法理解如何使用它。 我附上了文件的名称和我尝试过的代码。 我还创建了一个名为“merge.json”的空 json 文件,我希望将所有 json 文件合并到该文件中。 它们都有相同的字典键,但我想简单地合并文件,而不是将所有值合并到一个键中。 我附上了数据的样子以及我希望它在合并时的样子。 谢谢!
file1 = 'file1.json'
...
file6 = 'file6.json'
文件1:
{time:12, 'sizes':[1,2,3], 'scores':[80,100,77]},{time:42, 'sizes':[2,3,1], 'scores':[90,50,67]},{time:88, 'sizes':[162,124,1], 'scores':[90,100,97]}
文件2:
{time:52, 'sizes':[192,242,3], 'scores':[80,100,77]},{time:482, 'sizes':[2,376,1], 'scores':[9,50,27]},{time:643, 'sizes':[93,12,90], 'scores':[10,400,97]}
...
合并:
{time:12, 'sizes':[1,2,3], 'scores':[80,100,77]},{time:42, 'sizes':[2,3,1], 'scores':[90,50,67]},{time:88, 'sizes':[162,124,1], 'scores':[90,100,97]},{time:52, 'sizes':[192,242,3], 'scores':[80,100,77]},{time:482, 'sizes':[2,376,1], 'scores':[9,50,27]},{time:643, 'sizes':[93,12,90], 'scores':[10,400,97]}
我在另一个线程上看到要使用:
import json
import glob
result = []
for f in glob.glob("*.json"):
with open(f, "rb") as infile:
result.append(json.load(infile))
with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)
但我不明白“*.json”中的内容以及文件被调用的位置。 谢谢!
让我们使用 argparse 将其变成一个完整的工作程序,以便可以在命令行上指定文件。 然后可以在运行时决定哪个目录保存所需的 JSON 文件,您可以使用 shell 的 globbing 列出它们。
#!/usr/bin/env python
"""Read a list of JSON files holding a list of dictionaries and merge into
a single JSON file holding a list of all of the dictionaries"""
import sys
import argparse
import json
def do_merge(infiles, outfile):
merged = []
for infile in infiles:
with open(infile, 'r', encoding='utf-8') as infp:
data = json.load(infp)
assert isinstance(data, list), "invalid input"
merged.extend(data)
with open(outfile, 'w', encoding="utf-8") as outfp:
json.dump(merged, outfp)
return 0
def main(argv):
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('outfile', help="File to hold merged JSON")
parser.add_argument('infiles', nargs='+', help="List of files to merge")
args = parser.parse_args(argv)
retval = do_merge(args.infiles, args.outfile)
print(f"Merged {len(args.infiles)} files into {args.outfile}")
return retval
if __name__ == "__main__":
retval = main(sys.argv[1:])
exit(retval)
样品 JSON 文件设置为
mytest/file1.json
[{"time": 12, "sizes": [1, 2, 3], "scores": [80, 100, 77]},
{"time": 42, "sizes": [2, 3, 1], "scores": [90, 50, 67]},
{"time": 88, "sizes": [162, 124, 1], "scores": [90, 100, 97]}]
mytest/file2.json
[{"time": 52, "sizes": [192, 242, 3], "scores": [80, 100, 77]},
{"time": 482, "sizes": [2, 376, 1], "scores": [9, 50, 27]},
{"time": 643, "sizes": [93, 12, 90], "scores": [10, 400, 97]}]
和测试
~/tmp$ ./jsonmerge.py mergedjson.json mytest/*.json
Merged 2 files into mergedjson.json
将所有 JSON 文件放在一个目录下,并在同一目录下运行此代码
import json
import glob
result = []
for f in glob.glob("*.json"):
with open(f, "rb") as infile:
result.append(json.load(infile))
with open("merged_file.json", "wb") as outfile:
json.dump(result, outfile)
这将生成一个merged_file.json
,其中将包含来自所有 JSON 文件的合并数据。
for f in glob.glob("*.json")
将按照目录中存在的顺序遍历该目录中的每个 json 文件。
也许你可以像下面这样尝试,检查 repl.it代码-
import glob
a = glob.glob('./*.json')
print (a)
merged = open("merged.json", "w+")
for i in a:
with open(i, "r") as f:
for j in f.readlines():
merged.write(j)
merged.close()
如果您打算使用合并的 json 作为有效的 json,那么您必须将其结构良好。 (这假设单个 json 是有效的 json):
处理@tdelaney 的答案:
with open("merged_file.json", "wb") as outfile:
outfile.write("[")
counter=1
for f in glob.glob("*.json"):
with open(f, "rb") as infile:
line = None
for line in infile:
outfile.write(line)
if line is not None and not line.endswith(b"\n")
outfile.write(b"\n")
if counter < len(glob.glob("*.json")):
outfile.write(",")
else:
outfile.write("]")
counter=counter+1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.