简体   繁体   中英

How to convert a pandas df into a nested json

I am trying to convert a pandas dataframe into a nested json and am stuck. The Dataframe is containing Data in the following format:

   description         69    70   project_id    60
1  Lorem Ipsum...            14     5679        CA   
2  Lorem Ipsum...      hom   15     2904        CA  
3  Lorem Ipsum...      im    14     5679        CA       

What I want is a JSON that looks like this:

    [ {
  "issue": {
      "project_id": 5679,
      "description": "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum.",
      "custom_fields": [
        {
          "id": 69,
          "value": null
        },
        {
          "id": 60,
          "value": "CA"
        },
        {
          "id": 70,
          "value": "14"
        }
      ]
    }
  },
  {
  "issue": {
      "project_id": 2904,
      "description": "Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet ",
      "custom_fields": [
        {
          "id": 69,
          "value": "hom"
        },
        {
          "id": 60,
          "value": "CA"
        },
        {
          "id": 70,
          "value": "15"
        }
      ]
    }
  },

This is what I have so far, but it doesn´t result in the needed format, what I don´t get for example are the headers of the JSON ("issue"...) and the nesting ("id": ..., "value": ...)

j = (df_test.groupby(['description','project_id'])
       .apply(lambda x: x[['69', '70', '84', '60']].to_dict('records'))
       .reset_index()
       .rename(columns={0:'custom_fields'})
       .to_json(orient='records'))

Any ideas on what I have to adjust? Thanks in advance on any help and tips

Instead of putting it to one chained function call I would propose to simply loop over the data frame and process the information as needed:

import pandas as pd
import json

# create example data
df = pd.DataFrame({"description" : ['Lorem Ipsum', 'Lorem Ipsum', 'Lorem Ipsum'],
                   69: [None, "hom", "im"],
                   70: [14, 15, 14],
                   "project_id" : [5679, 2904, 5679],
                   60: ["CA", "CA", "CA"]
                   
                  })


# convert data to desired format
res = []
custom_field_keys = [69, 60, 70]

for idx, row in df.iterrows():
    entry = { key: value for key, value in row.items() if key not in custom_field_keys }
    entry["custom_fields"] = [
        {"id": key, "value": value} for key, value in row.items() if key in custom_field_keys
    ]
    res.append({"issue": entry})


print(json.dumps(res, indent=2))

this yields:

[
  {
    "issue": {
      "description": "Lorem Ipsum",
      "project_id": 5679,
      "custom_fields": [
        {
          "id": 69,
          "value": null
        },
        {
          "id": 70,
          "value": 14
        },
        {
          "id": 60,
          "value": "CA"
        }
      ]
    }
  },
  {
    "issue": {
      "description": "Lorem Ipsum",
      "project_id": 2904,
      "custom_fields": [
        {
          "id": 69,
          "value": "hom"
        },
        {
          "id": 70,
          "value": 15
        },
        {
          "id": 60,
          "value": "CA"
        }
      ]
    }
  },
  {
    "issue": {
      "description": "Lorem Ipsum",
      "project_id": 5679,
      "custom_fields": [
        {
          "id": 69,
          "value": "im"
        },
        {
          "id": 70,
          "value": 14
        },
        {
          "id": 60,
          "value": "CA"
        }
      ]
    }
  }
]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM