简体   繁体   中英

Create nested dictionary from Pandas DataFrame

I have a requirement to create a nested dictionary from a Pandas DataFrame.

Below is an example dataset in CSV format:

hostname,nic,vlan,status
server1,eth0,100,enabled
server1,eth2,200,enabled
server2,eth0,100
server2,eth1,100,enabled
server2,eth2,200
server1,eth1,100,disabled

Once the CSV is imported as a DataFrame I have:

>>> import pandas as pd
>>> 
>>> df = pd.read_csv('test.csv')
>>> 
>>> df
  hostname   nic  vlan    status
0  server1  eth0   100   enabled
1  server1  eth2   200   enabled
2  server2  eth0   100       NaN
3  server2  eth1   100   enabled
4  server2  eth2   200       NaN
5  server1  eth1   100  disabled

The output nested dictionary/JSON needs to group by the first two columns (hostname and nic), for example:

{
  "hostname": {
    "server1": {
      "nic": {
        "eth0": {
          "vlan": 100,
          "status": "enabled"
        },
        "eth1": {
          "vlan": 100,
          "status": "disabled"
        },
        "eth2": {
          "vlan": 200,
          "status": "enabled"
        }
      }
    },
    "server2": {
      "nic": {
        "eth0": {
          "vlan": 100
        },
        "eth1": {
          "vlan": 100,
          "status": "enabled"
        },
        "eth2": {
          "vlan": 200
        }
      }
    }
  }
}

I need to account for:

  • Missing data, for example not all rows will include 'status'. If this happens we just skip it in the output dictionary
  • hostnames in the first column may be listed out of order. For example, rows 0, 1 and 5 must be correctly grouped under server1 in the output dictionary
  • Extra columns beyond vlan and status may be added in future. These must be correctly grouped under hostname and nic

I have looked at groupby and multiindex in the Pandas documentation by as a newcomer I have got stuck.

Any help is appreciated on the best method to achieve this.

It may help to group the df first : df_new = df.groupby(["hostname", "nice"], as_index=False) - note, as_index=False preserves the dataframe format.

You can then use df_new.to_json(orient = 'records', lines=True) to convert your df to json format (as jtweeder mentions in comments). Once you get desired format and would like to write out, you can do something like:

with open('temp.json', 'w') as f: f.write(df_new.to_json(orient='records', lines=True))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM