简体   繁体   中英

Complex Pandas Dataframe to Nested Dictionary/JSON

I have 3 Dataframes, I have merged them into a single one, and want to represent the dataframe into a Nested dictionary / json format.

df1: This contains general information about a patient.

>>> df1 = pd.DataFrame({'PatientId' : [1,2], 'Gender' : ['M', 'F'], 'Marital_status':['married', 'unmarried']})

>>> df1 

PatientId   Gender   Marital_status
1           M        married
2           F        unmarried 

df2: this contains details of each admission of a patient, and the diagnosis.

>>> df2 = pd.DataFrame({'PatientId': [1,1,2,2], 'AdmissionId' : [1,2,1,2], 'Diagnosis_Code': ['DXS', 'SDE', 'DEF', 'ATR'], 'Stay_Duration' : [45,14,79,32]})

>>> df2

PatientId   AdmissionId   Diagnosis_Code   Stay_Duration
1           1             DXS              45
1           2             SDE              14
2           1             DEF              79
2           2             ATR              32

df3: this dataframe contains all the lab testes reports of a patient carried out in each admission.

>>> df3 = pd.DataFrame(
    {
        'PatientId':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
        'AdmissionId' : [1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2],
        'LabTest' : [1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2],
        'LabName' : ['ABC', 'XYZ', 'ABC', 'XYZ', 'PQR', 'XYZ', 'ABC', 'XYZ', 'ABC', 'XYZ', 'ABC', 'XYZ', 'PQR', 'XYZ', 'ABC', 'PQR'],
        'LabValue' : [5.7, 1.9, 5.6, 2.4, 5.7, 1.9, 5.6, 2.4, 5.7, 1.9, 5.6, 2.4, 5.7, 1.9, 5.6, 2.4],
        'IsNormal' : [True, False, True, True, True, False, True, True, True, False, True, True, True, False, True, True]

    }
    )

>>> df3

PatientId   AdmissionId   LabTest   LabName    LabValue  IsNormal
1           1             1             ABC      5.7       True
1           1             1             XYZ      1.9       False
1           1             2             ABC      5.6       True
1           1             2             XYZ      2.4       True
1           2             1             PQR      5.7       True
1           2             1             XYZ      1.9       False
1           2             2             ABC      5.6       True
1           2             2             XYZ      2.4       True
2           1             1             ABC      5.7       True
2           1             1             XYZ      1.9       False
2           1             2             ABC      5.6       True
2           1             2             XYZ      2.4       True
2           2             1             PQR      5.7       True
2           2             1             XYZ      1.9       False
2           2             2             ABC      5.6       True
2           2             2             PQR      2.4       True

I want my output to look like this --

"PatientId" : 1
"Gender":M
"Marital_Status" : married
"AdmissionsInfo":
                {
                  "AdmissionID": 1
                  "Diagnosis": DXS
                  "Stay_Duration" : 45
                  "lab reports" :
                                 {
                                   "labtest":1
                                   "labinfo":
                                             {
                                              "labName":ABC
                                              "labValue":5.6
                                              "isNormal":True
                                             },
                                             {
                                              "labName": XYZ
                                              "labValue": 2.4
                                              "isNormal": True
                                             }
                                   "labtest":2
                                   "labinfo":
                                             {
                                              "labName":ABC
                                              "labValue":5.7
                                              "isNormal":True
                                             },
                                             {
                                              "labName": XYZ
                                              "labValue": 1.9
                                              "isNormal":False
                                             }
                                 }
                  "AdmissionID": 2
                  "Diagnosis": SDE
                  "Stay_Duration" : 45
                  /
                  /

                  //
               } end of patient 1's all admissions' info
"PatientId" : 2
"Gender": F
"Marital_Status" : unmarried
"AdmissionsInfo":
                             //
                             //
          and so on }}}

Find below a full pandas (although redundant) solution.

First merge your three dataframes into one callled df_merged :

df_merged = df3.merge(df1, on="PatientId").merge(df2, on=["PatientId", "AdmissionId"])

Now create the hierarchy you need (this part is ugly, but works, happy to receive feedback on it):

(df_merged.groupby(["PatientId", "Gender", "Marital_status", "AdmissionId", "Diagnosis_Code", "Stay_Duration", "LabTest"])
          .apply(lambda x: x[["LabName", "LabValue", "IsNormal"]].to_dict("r"))
          .reset_index()
          .rename(columns={0:"LabInfo"})
          .groupby(["PatientId", "Gender", "Marital_status", "AdmissionId", "Diagnosis_Code", "Stay_Duration"])
          .apply(lambda x: x[["LabTest", "LabInfo"]].to_dict("r"))
          .reset_index()
          .rename(columns={0:"LabReports"})
          .groupby(["PatientId", "Gender", "Marital_status"])
          .apply(lambda x: x[["AdmissionId", "Diagnosis_Code", "Stay_Duration", "LabReports"]].to_dict("r"))
          .reset_index()
          .rename(columns={0:"AdmissionsInfo"})
          .to_json(orient="records"))

And dump that into a json object:

>>> import json
>>> print(json.dumps(json.loads(j), indent=2, sort_keys=False))

And the result:

[
  {
    "PatientId": 1,
    "Gender": "M",
    "Marital_status": "married",
    "AdmissionsInfo": [
      {
        "AdmissionId": "1",
        "Diagnosis_Code": "DXS",
        "Stay_Duration": "45",
        "LabReports": [
          {
            "LabTest": "1",
            "LabInfo": [
              {
                "LabName": "ABC",
                "LabValue": "5.7",
                "IsNormal": "True"
              },
              {
                "LabName": "XYZ",
                "LabValue": "1.9",
                "IsNormal": "False"
              }
            ]
          },
          {
            "LabTest": "2",
            "LabInfo": [
              {
                "LabName": "ABC",
                "LabValue": "5.6",
                "IsNormal": "True"
              },
              {
                "LabName": "XYZ",
                "LabValue": "2.4",
                "IsNormal": "True"
              }
            ]
          }
        ]
      },
      {
        "AdmissionId": "2",
        "Diagnosis_Code": "SDE",
        "Stay_Duration": "14",
        "LabReports": [
          {
            "LabTest": "1",
            "LabInfo": [
              {
                "LabName": "PQR",
                "LabValue": "5.7",
                "IsNormal": "True"
              },
              {
                "LabName": "XYZ",
                "LabValue": "1.9",
                "IsNormal": "False"
              }
            ]
          },
          {
            "LabTest": "2",
            "LabInfo": [
              {
                "LabName": "ABC",
                "LabValue": "5.6",
                "IsNormal": "True"
              },
              {
                "LabName": "XYZ",
                "LabValue": "2.4",
                "IsNormal": "True"
              }
            ]
          }
        ]
      }
    ]
  },
  {
    "PatientId": 2,
    "Gender": "F",
    "Marital_status": "unmarried",
    "AdmissionsInfo": [
      {
        "AdmissionId": "1",
        "Diagnosis_Code": "DEF",
        "Stay_Duration": "79",
        "LabReports": [
          {
            "LabTest": "1",
            "LabInfo": [
              {
                "LabName": "ABC",
                "LabValue": "5.7",
                "IsNormal": "True"
              },
              {
                "LabName": "XYZ",
                "LabValue": "1.9",
                "IsNormal": "False"
              }
            ]
          },
          {
            "LabTest": "2",
            "LabInfo": [
              {
                "LabName": "ABC",
                "LabValue": "5.6",
                "IsNormal": "True"
              },
              {
                "LabName": "XYZ",
                "LabValue": "2.4",
                "IsNormal": "True"
              }
            ]
          }
        ]
      },
      {
        "AdmissionId": "2",
        "Diagnosis_Code": "ATR",
        "Stay_Duration": "32",
        "LabReports": [
          {
            "LabTest": "1",
            "LabInfo": [
              {
                "LabName": "PQR",
                "LabValue": "5.7",
                "IsNormal": "True"
              },
              {
                "LabName": "XYZ",
                "LabValue": "1.9",
                "IsNormal": "False"
              }
            ]
          },
          {
            "LabTest": "2",
            "LabInfo": [
              {
                "LabName": "ABC",
                "LabValue": "5.6",
                "IsNormal": "True"
              },
              {
                "LabName": "PQR",
                "LabValue": "2.4",
                "IsNormal": "True"
              }
            ]
          }
        ]
      }
    ]
  }
]
In [84]: json_list = []                                                                                                                                                   

In [85]: for index1, row1 in df1.iterrows(): 
    ...:     d = dict(row1) 
    ...:     json_list.append(d) 
    ...:     for index2, row2 in df2[df2['PatientId'] == row1['PatientId']].iterrows(): 
    ...:         d2 = dict(row2) 
    ...:         del d2['PatientId'] 
    ...:         d['AdmissionsInfo'] = d2 
    ...:         lab_reports_list = [] 
    ...:         for index3, row3 in df3[(df3['PatientId'] == row2['PatientId']) & (df3['AdmissionId'] == row2['AdmissionId'])].iterrows(): 
    ...:             d3 = dict(row3) 
    ...:             d4 = {} 
    ...:             d4['labtest'] = row3['LabTest'] 
    ...:             d4['labinfo'] = {'labName': row3['LabName'], 'labValue': row3['LabValue'], 'isNormal': row3['IsNormal']} 
    ...:             lab_reports_list.append(d4) 
    ...:         d2['lab reports'] = lab_reports_list 

Output

In [87]: json_list                                                                                                                                                        
Out[87]: 

[{'PatientId': 1,
  'Gender': 'M',
  'Marital_status': 'married',
  'AdmissionsInfo': {'AdmissionId': 2,
   'Diagnosis_Code': 'SDE',
   'Stay_Duration': 14,
   'lab reports': [{'labtest': 1,
     'labinfo': {'labName': 'PQR', 'labValue': 5.7, 'isNormal': True}},
    {'labtest': 1,
     'labinfo': {'labName': 'XYZ', 'labValue': 1.9, 'isNormal': False}},
    {'labtest': 2,
     'labinfo': {'labName': 'ABC', 'labValue': 5.6, 'isNormal': True}},
    {'labtest': 2,
     'labinfo': {'labName': 'XYZ', 'labValue': 2.4, 'isNormal': True}}]}},
 {'PatientId': 2,
  'Gender': 'F',
  'Marital_status': 'unmarried',
  'AdmissionsInfo': {'AdmissionId': 2,
   'Diagnosis_Code': 'ATR',
   'Stay_Duration': 32,
   'lab reports': [{'labtest': 1,
     'labinfo': {'labName': 'PQR', 'labValue': 5.7, 'isNormal': True}},
    {'labtest': 1,
     'labinfo': {'labName': 'XYZ', 'labValue': 1.9, 'isNormal': False}},
    {'labtest': 2,
     'labinfo': {'labName': 'ABC', 'labValue': 5.6, 'isNormal': True}},
    {'labtest': 2,
     'labinfo': {'labName': 'PQR', 'labValue': 2.4, 'isNormal': True}}]}}]

Arnaud's response does the job but seems not very "Pythonic".

There might be something possible with aggregate functions on DataFrameGroupby. Tryed the .agg(dict) but not functional.

for those wanting to help:

patient_df = pd.DataFrame({'PatientId' : [1,2], 'Gender' : ['M', 'F'], 'Marital_status':['married', 'unmarried']})

admission_df = pd.DataFrame({'PatientId': [1,1,2,2], 'AdmissionId' : [1,2,1,2], 'Diagnosis_Code': ['DXS', 'SDE', 'DEF', 'ATR'], 'Stay_Duration' : [45,14,79,32]})

lab_df = pd.DataFrame(
{
    'PatientId':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
    'AdmissionId' : [1,1,1,1,2,2,2,2,1,1,1,1,2,2,2,2],
    'LabTest' : [1,1,2,2,1,1,2,2,1,1,2,2,1,1,2,2],
    'LabName' : ['ABC', 'XYZ', 'ABC', 'XYZ', 'PQR', 'XYZ', 'ABC', 'XYZ', 'ABC', 'XYZ', 'ABC', 'XYZ', 'PQR', 'XYZ', 'ABC', 'PQR'],
    'LabValue' : [5.7, 1.9, 5.6, 2.4, 5.7, 1.9, 5.6, 2.4, 5.7, 1.9, 5.6, 2.4, 5.7, 1.9, 5.6, 2.4],
    'IsNormal' : [True, False, True, True, True, False, True, True, True, False, True, True, True, False, True, True]

}
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM