简体   繁体   中英

Merge several json picking the odd one for each value in Python

I currently have N json input files that all have the same structure, but where N - 1 values are defined as "None" for each of them. I want to combine them into a single json where, much like a git merge/patch, it always picks the set value (ie, the one different from "None"). Here is an example (fictious):

json 1: {'a': 'aaa', ['b': 'None', 'c': 'None']}
json 2: {'a': 'None', ['b': 'bbb', 'c': 'None']}
json 3: {'a': 'None', ['b': 'None', 'c': 'ccc']}

expected result: {'a': 'aaa', ['b': 'bbb', 'c': 'ccc']}

Atm, I'm thinking of using a zip for all the input files, iterating each word and picking whatever is not 'None' for composing the output file. However, I'm thinking there must be a cleaner way of doing it that I'm just not seeing now.. Thanks in advance!

The format of your json files is incorrect right now. You should verify that and update the code accordingly. As of now, I have converted your json in the following format:

json_1 = {"a": "aaa", "b": "None", "c": "None"}
json_2 = {"a": "None", "b": "bbb", "c": "None"}
json_3 = {"a": "None", "b": "None", "c": "ccc"}

If the data is in files, you can use the following function:

import json

f = open ('data.json', "r")
json.load(f.read())

and if the data is in string format, you can use:

import json

json_1 = json.loads('{"a": "aaa", "b": "None", "c": "None"}')
json_2 = json.loads('{"a": "None", "b": "bbb", "c": "None"}')
json_3 = json.loads('{"a": "None", "b": "None", "c": "ccc"}')

As for the solution, iterating over the json files will be the best option. An alternate approach to solve the problem would be to clear all the keys which contain "None" beforehand and then merging them as one. The sameple code for the same is:

json_clean_1 = {k: v for k, v in json_1.items() if v != "None"}
json_clean_2 = {k: v for k, v in json_2.items() if v != "None"}
json_clean_3 = {k: v for k, v in json_3.items() if v != "None"}

output_json = dict(list(json_clean_1.items()) + list(json_clean_2.items()) + list(json_clean_3.items()))
print(output_json)

Based on the comment, you could use a solution similar to this. It should give you an output dataframe with a column containing the combined values:

import os
import json
import pandas as pd

# get files
json_files = [
    each_file for each_file in os.listdir(".") if each_file.endswith('.json')
]

# read files
dfs = []
for file in json_files:
    with open(file) as f:
        json_data = pd.json_normalize(json.loads(f.read()))
    dfs.append(json_data)

# combine and clean df
combined_df = pd.concat(dfs, ignore_index=True)
cleaned_df = combined_df.replace("None", pd.NA).transpose()

# df with required column
cleaned_df['combined_col'] = cleaned_df[cleaned_df.columns].apply(
    lambda x: ','.join(x.dropna().astype(str)), axis=1)
print(cleaned_df)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM