Aggregate by value in JSON object within Pandas Dataframe in Python

Question

I have loaded a json array to python as dataframe using pandas. My python code is as below:

import json
import pandas as pd

jsontxt = pd.read_json ('array.json')

df = pd.DataFrame(jsontxt['Total-Hours'])

print(df)

The output is as below:

    Total-Hours

0   {'value': 3.0}
1   {'value': 2.0}
2   {'value': 1.0}
3   {'value': 5.0}
4   {'value': 3.0}
5   {'value': 5.0}

I want to group the data by the value in total hours. Something like below:

val = df.groupby(['Total-Hours']).mean();

My JSON is as below:

[
              {
                "key" : "Jacob",
                "doc_count" : 11,
                "Total-Hours" : {
                  "value" : 3.0
                },
                "Calculated-Category" : {
                  "value" : 4.0
                }
              },
              {
                "key" : "AH",
                "doc_count" : 2,
                "Total-Hours" : {
                  "value" : 2.0
                },
                "Calculated-Category" : {
                  "value" : 1.0
                }
              },
              {
                "key" : "FJ",
                "doc_count" : 1,
                "Total-Hours" : {
                  "value" : 1.0
                },
                "Calculated-Category" : {
                  "value" : 4.0
                }
              },
              {
                "key" : "Helen",
                "doc_count" : 1,
                "Total-Hours" : {
                  "value" : 5.0
                },
                "Calculated-Category" : {
                  "value" : 2.0
                }
              },
              {
                "key" : "Test",
                "doc_count" : 1,
                "Total-Hours" : {
                  "value" : 3.0
                },
                "Calculated-Category" : {
                  "value" : 3.0
                }
              },
              {
                "key" : "John",
                "doc_count" : 1,
                "Total-Hours" : {
                  "value" : 5.0
                },
                "Calculated-Category" : {
                  "value" : 3.0
                }
              }
            ]

However that requires the Total-Hours to be numeric. What is the best way to achieve this?

Answer 1

Pandas currently understands the row values as dict types, so you update the array using the extracted 'value' key from the dictionary.

Below i am using a list comprehension which updates the dataframe, with the extracted values from the dictionary. I print the updated dataframe, and then finally print the mean.

Also note, you don't need to create a new dataframe as you already have one within jsontxt.

import pandas as pd

jsontxt = pd.read_json('array.json')

print(jsontxt)

jsontxt['Total Hours'] = [x['value'] for x in jsontxt['Total Hours']]

print(jsontxt)

print(jsontxt.mean())

Here is the output

      Total Hours
0  {'value': 3.0}
1  {'value': 2.0}
2  {'value': 1.0}
3  {'value': 5.0}
4  {'value': 3.0}
5  {'value': 5.0}
   Total Hours
0          3.0
1          2.0
2          1.0
3          5.0
4          3.0
5          5.0
Total Hours    3.166667
dtype: float64

Here is what my input file looked like:

{
    "Total Hours": [
        {"value": 3.0},
        {"value": 2.0},
        {"value": 1.0},
        {"value": 5.0},
        {"value": 3.0},
        {"value": 5.0}
    ]
}

Answer 2

You can treat you input as a dict, then select the total Hours column. The apply, will create a new serie from the column from which you can compute the mean

 mean_hours = pd.DataFrame.from_dict(myjson)['Total Hours'].apply(pd.Series).mean()

or from the full input (extra -)

 mean_hours = pd.DataFrame.from_dict(myjson)['Total-Hours'].apply(pd.Series).mean()

Aggregate by value in JSON object within Pandas Dataframe in Python

Question

2 answers

solution1
1 2020-10-28 13:43:18

solution2
1 2020-10-28 14:08:38

Aggregate by value in JSON object within Pandas Dataframe in Python

Question

2 answers

solution1 1 2020-10-28 13:43:18

solution2 1 2020-10-28 14:08:38

solution1
1 2020-10-28 13:43:18

solution2
1 2020-10-28 14:08:38