Create nested dictionary from Pandas DataFrame

Question

I have a requirement to create a nested dictionary from a Pandas DataFrame.

Below is an example dataset in CSV format:

hostname,nic,vlan,status
server1,eth0,100,enabled
server1,eth2,200,enabled
server2,eth0,100
server2,eth1,100,enabled
server2,eth2,200
server1,eth1,100,disabled

Once the CSV is imported as a DataFrame I have:

>>> import pandas as pd
>>> 
>>> df = pd.read_csv('test.csv')
>>> 
>>> df
  hostname   nic  vlan    status
0  server1  eth0   100   enabled
1  server1  eth2   200   enabled
2  server2  eth0   100       NaN
3  server2  eth1   100   enabled
4  server2  eth2   200       NaN
5  server1  eth1   100  disabled

The output nested dictionary/JSON needs to group by the first two columns (hostname and nic), for example:

{
  "hostname": {
    "server1": {
      "nic": {
        "eth0": {
          "vlan": 100,
          "status": "enabled"
        },
        "eth1": {
          "vlan": 100,
          "status": "disabled"
        },
        "eth2": {
          "vlan": 200,
          "status": "enabled"
        }
      }
    },
    "server2": {
      "nic": {
        "eth0": {
          "vlan": 100
        },
        "eth1": {
          "vlan": 100,
          "status": "enabled"
        },
        "eth2": {
          "vlan": 200
        }
      }
    }
  }
}

I need to account for:

Missing data, for example not all rows will include 'status'. If this happens we just skip it in the output dictionary
hostnames in the first column may be listed out of order. For example, rows 0, 1 and 5 must be correctly grouped under server1 in the output dictionary
Extra columns beyond vlan and status may be added in future. These must be correctly grouped under hostname and nic

I have looked at groupby and multiindex in the Pandas documentation by as a newcomer I have got stuck.

Any help is appreciated on the best method to achieve this.

Answer 1

It may help to group the df first : df_new = df.groupby(["hostname", "nice"], as_index=False) - note, as_index=False preserves the dataframe format.

You can then use df_new.to_json(orient = 'records', lines=True) to convert your df to json format (as jtweeder mentions in comments). Once you get desired format and would like to write out, you can do something like:

with open('temp.json', 'w') as f: f.write(df_new.to_json(orient='records', lines=True))

Create nested dictionary from Pandas DataFrame

Question

1 answers

solution1
0 ACCPTED 2019-03-20 19:58:54

Create nested dictionary from Pandas DataFrame

Question

1 answers

solution1 0 ACCPTED 2019-03-20 19:58:54

solution1
0 ACCPTED 2019-03-20 19:58:54