简体   繁体   中英

Python - Adding fields and labels to nested json file

I have a dataframe as follows:

Name_ID | URL                    | Count | Rating
------------------------------------------------
ABC     | www.example.com/ABC    | 10    | 5
123     | www.example.com/123    | 9     | 4
XYZ     | www.example.com/XYZ    | 5     | 2
ABC111  | www.example.com/ABC111 | 5     | 2
ABC121  | www.example.com/ABC121 | 5     | 2
222     | www.example.com/222    | 5     | 3
abc222  | www.example.com/abc222 | 4     | 2
ABCaaa  | www.example.com/ABCaaa | 4     | 2

I am trying to create a JSON as follows:

{
    "name": "sampledata",
    "children": [
        {
            "name": 9,
            "children": [
                {
                    "name": 4,
                    "children": [
                        {
                            "name": "123",
                            "size": 100
                        }
                    ]
                }
            ]
        },
        {
            "name": 10,
            "children": [
                {
                    "name": 5,
                    "children": [
                        {
                            "name": "ABC",
                            "size": 100
                        }
                    ]
                }
            ]
        },
        {
            "name": 4,
            "children": [
                {
                    "name": 2,
                    "children": [
                        {
                            "name": "abc222",
                            "size": 50
                        },
                        {
                            "name": "ABCaaa",
                            "size": 50
                        }
                    ]
                }
            ]
        },
        {
            "name": 5,
            "children": [
                {
                    "name": 2,
                    "children": [
                        {
                            "name": "ABC",
                            "size": 16
                        },
                        {
                            "name": "ABC111",
                            "size": 16
                        },
                        {
                            "name": "ABC121",
                            "size": 16
                        }
                    ]
                },
                {
                    "name": 3,
                    "children": [
                        {
                            "name": "222",
                            "size": 50
                        }
                    ]
                }
            ]
        }
    ]
}

In order to do that:

  • I am trying to add labels such as "name" and "children" to the json while creating it.

I tried something like

results = [{"name": i, "children": j} for i,j in results.items()]

But it won't label it properly I believe.

  • Also, add another field with the label `"size"which I am planning to calculate based on the formula:

     (Rating*Count*10000)/number_of_children_to_the_immediate_parent 

Here is my dirty code:

import pandas as pd
from collections import defaultdict
import json

data =[('ABC', 'www.example.com/ABC', 10   , 5), ('123', 'www.example.com/123', 9, 4), ('XYZ', 'www.example.com/XYZ', 5, 2), ('ABC111', 'www.example.com/ABC111', 5, 2), ('ABC121', 'www.example.com/ABC121', 5, 2), ('222', 'www.example.com/222', 5, 3), ('abc222', 'www.example.com/abc222', 4, 2), ('ABCaaa', 'www.example.com/ABCaaa', 4, 2)]

df = pd.DataFrame(data, columns=['Name', 'URL', 'Count', 'Rating'])

gp = df.groupby(['Count'])

dict_json = {"name": "flare"}
children = []

for name, group in gp:
    temp = {}
    temp["name"] = name
    temp["children"] = []

    rgp = group.groupby(['Rating'])
    for n, g in rgp:
        temp2 = {}
        temp2["name"] = n
        temp2["children"] = g.reset_index().T.to_dict().values()
        for t in temp2["children"]:
            t["size"] = (t["Rating"] * t["Count"] * 10000) / len(temp2["children"])
            t["name"] = t["Name"]
            del t["Count"]
            del t["Rating"]
            del t["URL"]
            del t["Name"]
            del t["index"]
        temp["children"].append(temp2)
    children.append(temp)

dict_json["children"] = children

print json.dumps(dict_json, indent=4)

Though the above code does print what I need, I am looking for more efficient and cleaner way to do the same, mainly because the actual dataset might be even more nested and complicated. Any help/suggestion will be much appreciated.

Quite an interesting problem and a great question!

You can improve your approach by reorganizing the code inside the loops and using list comprehensions . No need to delete things and introduce temp variables inside loops:

dict_json = {"name": "flare"}

children = []
for name, group in gp:
    temp = {"name": name, "children": []}

    rgp = group.groupby(['Rating'])
    for n, g in rgp:
        temp["children"].append({
            "name": n,
            "children": [
                {"name": row["Name"],
                 "size": row["Rating"] * row["Count"] * 10000 / len(g)}
                for _, row in g.iterrows()
            ]
        })

    children.append(temp)

dict_json["children"] = children

Or, a "wrapped" version:

dict_json = {
    "name": "flare", 
    "children": [
        {
            "name": name, 
            "children": [
                {
                    "name": n,
                    "children": [
                        {
                            "name": row["Name"],
                            "size": row["Rating"] * row["Count"] * 10000 / len(g)
                        } for _, row in g.iterrows()
                    ]
                } for n, g in group.groupby(['Rating'])
            ]
        } for name, group in gp
    ]
}

I'm getting the following dictionary printed for you sample input dataframe:

{
    "name": "flare", 
    "children": [
        {
            "name": 4, 
            "children": [
                {
                    "name": 2, 
                    "children": [
                        {
                            "name": "abc222", 
                            "size": 40000
                        }, 
                        {
                            "name": "ABCaaa", 
                            "size": 40000
                        }
                    ]
                }
            ]
        }, 
        {
            "name": 5, 
            "children": [
                {
                    "name": 2, 
                    "children": [
                        {
                            "name": "XYZ", 
                            "size": 33333
                        }, 
                        {
                            "name": "ABC111", 
                            "size": 33333
                        }, 
                        {
                            "name": "ABC121", 
                            "size": 33333
                        }
                    ]
                }, 
                {
                    "name": 3, 
                    "children": [
                        {
                            "name": "222", 
                            "size": 150000
                        }
                    ]
                }
            ]
        }, 
        {
            "name": 9, 
            "children": [
                {
                    "name": 4, 
                    "children": [
                        {
                            "name": "123", 
                            "size": 360000
                        }
                    ]
                }
            ]
        }, 
        {
            "name": 10, 
            "children": [
                {
                    "name": 5, 
                    "children": [
                        {
                            "name": "ABC", 
                            "size": 500000
                        }
                    ]
                }
            ]
        }
    ]
}

If I understand correctly what you wan't to do is put a groupby into a nested json, if that is the case then you could use pandas groupby and cast it into a nested list of lists as so:

lol = pd.DataFrame(df.groupby(['Count','Rating'])\
               .apply(lambda x: list(x['Name_ID']))).reset_index().values.tolist()

lol should look something like this:

[['10', '5', ['ABC']],
['4', '2', ['abc222', 'ABCaaa']],
['5', '2', ['XYZ ', 'ABC111', 'ABC121']],
['5', '3', ['222 ']],
['9', '4', ['123 ']]]

after that you could loop over lol to put it into a dict, but since you want to set nested items you'l have to use autovivification (check it out):

class autovividict(dict):
   def __missing__(self, key):
      value = self[key] = type(self)()
      return value

d = autovividict()
for l in lol:
    d[l[0]][l[1]] = l[2]

now you can use the json pack for printing and exporting:

print json.dumps(d,indent=2)

In case you need more than one groupby, you could concat your groups with pandas, cast to lol, remove any nans, and then loop, let me know if a full example can help.

setup

from io import StringIO
import pandas as pd

txt = """Name_ID,URL,Count,Rating
ABC,www.example.com/ABC,10,5
123,www.example.com/123,9,4
XYZ,www.example.com/XYZ,5,2
ABC111,www.example.com/ABC111,5,2
ABC121,www.example.com/ABC121,5,2
222,www.example.com/222,5,3
abc222,www.example.com/abc222,4,2
ABCaaa,www.example.com/ABCaaa,4,2"""

df = pd.read_csv(StringIO(txt))

size
pre-calculate it

df['size'] = df.Count.mul(df.Rating) \
                     .mul(10000) \
                     .div(df.groupby(
                        ['Count', 'Rating']).Name_ID.transform('count')
                     ).astype(int)

solution
create recursive function

def h(d):
    if isinstance(d, pd.Series): d = d.to_frame().T
    rec_cond = d.index.nlevels > 1 or d.index.nunique() > 1
    return {'name': str(d.index[0]), 'size': str(d['size'].iloc[0])} if not rec_cond else \
        [dict(name=str(n), children=h(g.xs(n))) for n, g in d.groupby(level=0)]

demo

import json

my_dict = dict(name='flare', children=h(df.set_index(['Count', 'Rating', 'Name_ID'])))

json.dumps(my_dict)

'{"name": "flare", "children": [{"name": "4", "children": [{"name": "2", "children": [{"name": "ABCaaa", "children": {"name": "ABCaaa", "size": "40000"}}, {"name": "abc222", "children": {"name": "abc222", "size": "40000"}}]}]}, {"name": "5", "children": [{"name": "2", "children": [{"name": "ABC111", "children": {"name": "ABC111", "size": "33333"}}, {"name": "ABC121", "children": {"name": "ABC121", "size": "33333"}}, {"name": "XYZ", "children": {"name": "XYZ", "size": "33333"}}]}, {"name": "3", "children": {"name": "222", "size": "150000"}}]}, {"name": "9", "children": [{"name": "4", "children": {"name": "123", "size": "360000"}}]}, {"name": "10", "children": [{"name": "5", "children": {"name": "ABC", "size": "500000"}}]}]}'

my_dict

{'children': [{'children': [{'children': [{'children': {'name': 'ABCaaa',
        'size': '40000'},
       'name': 'ABCaaa'},
      {'children': {'name': 'abc222', 'size': '40000'}, 'name': 'abc222'}],
     'name': '2'}],
   'name': '4'},
  {'children': [{'children': [{'children': {'name': 'ABC111', 'size': '33333'},
       'name': 'ABC111'},
      {'children': {'name': 'ABC121', 'size': '33333'}, 'name': 'ABC121'},
      {'children': {'name': 'XYZ', 'size': '33333'}, 'name': 'XYZ'}],
     'name': '2'},
    {'children': {'name': '222', 'size': '150000'}, 'name': '3'}],
   'name': '5'},
  {'children': [{'children': {'name': '123', 'size': '360000'}, 'name': '4'}],
   'name': '9'},
  {'children': [{'children': {'name': 'ABC', 'size': '500000'}, 'name': '5'}],
   'name': '10'}],
 'name': 'flare'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM