简体   繁体   中英

pandas df into nested json

Quite a similar question was asked there , and was brilliantly answered by user1609452 in R. Still, it was a specific problematic. I'd like to expand the question. let's take almost the same table (MyData):

ID  Location  L_size   L_color    Station    S_size   S_color     Category   C_size   C_color  
1     Alpha     6      #000000      Zeta       3      #333333      Big       0.63     #306100
2     Alpha     6      #000000      Zeta       3      #333333     Medium     0.43     #458b00
3     Alpha     6      #000000      Zeta       3      #333333     small      0.47     #6aa232
4     Alpha     6      #000000      Yota       3      #4c4c4c      Big       0.85     #306100
5     Alpha     6      #000000      Yota       3      #4c4c4c     Medium     0.19     #458b00
6     Alpha     6      #000000      Yota       3      #4c4c4c     small      0.89     #6aa232
7      Beta     6      #191919      Theta      4      #666666      Big       0.09     #306100
8      Beta     6      #191919      Theta      4      #666666     Medium     0.33     #458b00
9      Beta     6      #191919      Theta      4      #666666     small      0.79     #6aa232
10     Beta     6      #191919      Theta      4      #666666      Big       0.89     #306100
11     Beta     6      #191919       Meta      3      #7f7f7f     Medium     0.71     #458b00
12     Beta     6      #191919       Meta      3      #7f7f7f     small      0.59     #6aa232

Each category has one or multiple attributes (here, only one: size). What I'd like, it's to report the size for each parent/children in the json file:

       {
 "name":"MyData",
 "size":12,
 "color":"#ffffff"
 "children":[
   {
     "name":"Alpha",
     "size":6,
     "color":"#000000"
     "children":[
        {
           "name":"Zeta",
           "size":3,
           "color":"#333333"
           "children":[
              {
                 "name":"Big",
                 "size":0.63,
                 "color":"#306100"
              },
...

etc. I couldn't make it in R, nor in pandas... Any idea?

EDIT: My goal is to link diverse information to children, not only size. I added up a color column for each main column. My initial dataframe is big and has a lot of information, but I can't paste it here, for clarity sake.

SECOND EDIT: To chrisb answer It almost worked! Great update. Still the json file isn't properly uploaded into my javascript file. The file seems to be upside down (mydata is at the end), and the information from a parent is before and after children information:

{  
   "children":[  
      {  
         "color":"#000000",
         "children":[  
            {  
               "color":"#4c4c4c",
               "children":{  
                  "color":"#306100",
                  "name":"Big",
                  "size":0.85
               },
               "name":"Yota",
               "size":3
            },
            {  
               "color":"#333333",
               "children":{  
                  "color":"#306100",
                  "name":"Big",
                  "size":0.63
               },
               "name":"Zeta",
               "size":3
            }
         ],
         "name":"Alpha",
         "size":6
      },
      {  
         "color":"#191919",
         "children":[  
            {  
               "color":"#7f7f7f",
               "children":{  
                  "color":"#458b00",
                  "name":"Medium",
                  "size":0.71
               },
               "name":"Meta",
               "size":3
            },
            {  
               "color":"#666666",
               "children":{  
                  "color":"#306100",
                  "name":"Big",
                  "size":0.09
               },
               "name":"Theta",
               "size":4
            }
         ],
         "name":"Beta",
         "size":6
      }
   ],
   "name":"MyData",
   "size":12

LAST EDIT: Works fine. Chris removed the last part of the script he wrote when he updated it, so here it is. Thanks Chris!

data = {'name': 'MyData',
        'size': len(MyData),
        'children': make_children(MyData, levels)}

print json.dumps(data)

First, you need some kind of mapping of what makes up each level. I'm using tuples of the column that defines the "name" and the prefix of the other attributes you want from that level, like this.

levels = [('Location', 'L_'),
          ('Station', 'S_'),
          ('Category', 'C_')]

Then, it's a similar recursive function, only now the extra columns are being picked up at each step (finding columns that start with the prefix) and being added to the tree by zipping the the columns / values. There's room to clean this up, but should at least give an idea.

def make_children(df, levels):
    if len(levels) == 1:
        name, prefix = levels[0]
        level_cols = [name] + [c for c in df if c.startswith(prefix)]
        df = df[level_cols]
        key_names = ['name'] + [c.strip(prefix) for c in level_cols[1:]]
        return dict(zip(key_names, df.values[0]))
    else:
        h, tail = levels[0], levels[1:]
        name, prefix = h
        level_cols = [name] + [c for c in df if c.startswith(prefix)]

        data = []
        for keys, df_gb in df.groupby(level_cols):
            key_names = ['name'] + [c.strip(prefix) for c in level_cols[1:]]
            d = dict(zip(key_names, keys))
            d['children'] = make_children(df_gb, tail)
            data.append(d)
        return data    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM