简体   繁体   中英

How to create a nested JSON from a pandas dataframe in Python

I have a pandas dataframe containing windows 10 logs. I want this pandas df to convert to JSON. What is an efficient way to do this?

I already made it to generate a default pandas df, however this is not nested. How I want it

{
    "0": {
        "ProcessName": "Firefox",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "1": {
        "ProcessName": "Excel",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "2": {
        "ProcessName": "Word",
        "time": "2019-07-12T01:30:00",
        "timeFloat": 1562888000.0,
        "internal_time": 1.5533333333,
        "counter": 0
}

I want it to look like like this

{
    "0": {
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "Processes" : {
                     "Firefox" : 0 # ("counter" value),
                     "Excel" : 0 
    },
    "1": ...
}

It seems to me that you want to create JSON from an aggregated data based on ['time', 'timeFloat', 'internal_time'] which you can get doing:

pd.groupby(['time', 'timeFloat', 'internal_time'])

However, your example suggests that you want to maintain the index key ( "0", "1" , etc.) which is contrary to the previously stated intention.

The aggregated values from one time point:

"Firefox" : 0
"Excel" : 0 

seem like correspond to these index keys which will be lost when you do the aggregation.

However, if you decided using aggregation the code would look something like this:

# reading in data:

import pandas as pd
import json
json_data = {
    "0": {
        "ProcessName": "Firefox",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "1": {
        "ProcessName": "Excel",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "2": {
        "ProcessName": "Word",
        "time": "2019-07-12T01:30:00",
        "timeFloat": 1562888000.0,
        "internal_time": 1.5533333333,
        "counter": 0
}}

df = pd.DataFrame.from_dict(json_data)
df = df.T
df.set_index(["ProcessName", 'time', 'timeFloat', 'internal_time', 'counter'])

# processing:
ddf = df.groupby(['time', 'timeFloat', 'internal_time'], as_index=False).agg(lambda x: list(x))
ddf['Processes'] = ddf.apply(lambda r: dict(zip(r['ProcessName'], r['counter'])), axis=1)
ddf = ddf.drop(['ProcessName', 'counter'], axis=1).

# printing the result:
json2 = json.loads(ddf.to_json(orient="records"))
print(json.dumps(json2, indent=4, sort_keys=True))

Result:

[
    {
        "Processes": {
            "Excel": 0,
            "Firefox": 0
        },
        "internal_time": 0.0,
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0
    },
    {
        "Processes": {
            "Word": 0
        },
        "internal_time": 1.5533333333,
        "time": "2019-07-12T01:30:00",
        "timeFloat": 1562888000.0
    }
]

As I understand you need group objects by "time" and merge counters from different processes. If yes - here is an example of implementation:

input_data = {
    "0": {
        "ProcessName": "Firefox",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "2": {
        "ProcessName": "ZXC",
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    },
    "3": {
        "ProcessName": "QWE",
        "time": "else_time",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "counter": 0
    }
}


def group_input_data_by_time(dict_data):
    time_data = {}
    for value_dict in dict_data.values():
        counter = value_dict["counter"]
        process_name = value_dict["ProcessName"]
        time_ = value_dict["time"]
        common_data = {
            "time": time_,
            "timeFloat": value_dict["timeFloat"],
            "internal_time": value_dict["internal_time"],
        }
        common_data = time_data.setdefault(time_, common_data)
        processes = common_data.setdefault("Processes", {})
        processes[process_name] = counter

    # if required to change keys from time to enumerated
    result_dict = {}
    for ind, value in enumerate(time_data.values()):
        result_dict[str(ind)] = value

    return result_dict


print(group_input_data_by_time(input_data))

Result is:

{
    "0": {
        "time": "2019-07-12T00:00:00",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "Processes": {
            "Firefox": 0,
            "ZXC": 0
        }
    },
    "1": {
        "time": "else_time",
        "timeFloat": 1562882400.0,
        "internal_time": 0.0,
        "Processes": {
            "QWE": 0
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM