简体   繁体   中英

How to convert Pandas DataFrame to custom nested JSON?

I am new to Pandas and I am trying to convert a Pandas DataFrame to a custom nested JSON string (maybe write it to a file). I tried to use the built-in Pandas to_json() function but it didn't quite work for me. I am posting a portion of my Pandas DF and what I want my final result to look like. Ideally, I would like to have the "id" key be populated be with whatever the index of the of the DF is. I think my goal here is to not worry about what the column names but rather have a way to programmatically turn the DF into a JSON string. I originally wrote a for loop that would iterate through every row and write the contents into a file but after some thinking, I believe that would be error-prone as most of the JSON serialization is handled manually. Any help would be appreciated. Thank you and sorry for the long post.

Pandas Dataframe

              BarcodeSequence LinkerPrimerSequence    BodySite    Year  Month   Day    Subject ReportedAntibioticUsage  DaysSinceExperimentStart                      Description
#SampleID
L1S8         AGCTGACTAGTC  GTGCCAGCMGCCGCGGTAA         gut  2008.0   10.0  28.0  subject-1                     Yes                       0.0         subject-1.gut.2008-10-28
L1S57        ACACACTATGGC  GTGCCAGCMGCCGCGGTAA         gut  2009.0    1.0  20.0  subject-1                      No                      84.0          subject-1.gut.2009-1-20
L1S76        ACTACGTGTGGT  GTGCCAGCMGCCGCGGTAA         gut  2009.0    2.0  17.0  subject-1                      No                     112.0          subject-1.gut.2009-2-17
L1S105       AGTGCGATGCGT  GTGCCAGCMGCCGCGGTAA         gut  2009.0    3.0  17.0  subject-1                      No                     140.0          subject-1.gut.2009-3-17
L2S155       ACGATGCGACCA  GTGCCAGCMGCCGCGGTAA   left palm  2009.0    1.0  20.0  subject-1                      No                      84.0    subject-1.left-palm.2009-1-20
L2S175       AGCTATCCACGA  GTGCCAGCMGCCGCGGTAA   left palm  2009.0    2.0  17.0  subject-1                      No                     112.0    subject-1.left-palm.2009-2-17
L2S204       ATGCAGCTCAGT  GTGCCAGCMGCCGCGGTAA   left palm  2009.0    3.0  17.0  subject-1                      No                     140.0    subject-1.left-palm.2009-3-17
L2S222       CACGTGACATGT  GTGCCAGCMGCCGCGGTAA   left palm  2009.0    4.0  14.0  subject-1                      No                     168.0    subject-1.left-palm.2009-4-14
L3S242       ACAGTTGCGCGA  GTGCCAGCMGCCGCGGTAA  right palm  2008.0   10.0  28.0  subject-1                     Yes                       0.0  subject-1.right-palm.2008-10-28
L3S294       CACGACAGGCTA  GTGCCAGCMGCCGCGGTAA  right palm  2009.0    

Expected JSON String

[
{
  "id": "L1S8",
  "metadata": {
    "BarcodeSequence": "AGCTGACTAGTC",
    "LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
    "BodySite": "gut",
    "Year": 2008.0,
    "Month": 10.0,
    "Day": 28.0,
    "Subject": "subject-1",
    "ReportedAntibioticUsage": "Yes",
    "DaysSinceExperimentStart": 0.0,
    "Description": "subject-1.gut.2008-10-28"
  },
  "sample_frequency": "7068.0"
},
{
  "id": "L1S57",
  "metadata": {
    "BarcodeSequence": "ACACACTATGGC",
    "LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
    "BodySite": "gut",
    "Year": 2009.0,
    "Month": 1.0,
    "Day": 20.0,
    "Subject": "subject-1",
    "ReportedAntibioticUsage": "No",
    "DaysSinceExperimentStart": 84.0,
    "Description": "subject-1.gut.2009-1-20"
  },
  "sample_frequency": "8756.0"
},
{
  "id": "L1S76",
  "metadata": {
    "BarcodeSequence": "ACTACGTGTGGT",
    "LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
    "BodySite": "gut",
    "Year": 2009.0,
    "Month": 2.0,
    "Day": 17.0,
    "Subject": "subject-1",
    "ReportedAntibioticUsage": "No",
    "DaysSinceExperimentStart": 112.0,
    "Description": "subject-1.gut.2009-2-17"
  },
  "sample_frequency": "7922.0"
},
{
  "id": "L1S105",
  "metadata": {
    "BarcodeSequence": "AGTGCGATGCGT",
    "LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
    "BodySite": "gut",
    "Year": 2009.0,
    "Month": 3.0,
    "Day": 17.0,
    "Subject": "subject-1",
    "ReportedAntibioticUsage": "No",
    "DaysSinceExperimentStart": 140.0,
    "Description": "subject-1.gut.2009-3-17"
  },
  "sample_frequency": "7865.0"
}
]

Here is a way to build the JSON (sort of) dynamically. You still need to make a few assumptions though, and I am not sure if your use case will admit these:

  1. Columns names are unique.
  2. You know the name of the column you want to use as the "value" column. In my sample dataframe I have called this one value , sample_frequency would be the "value" column in your example dataframe.
  3. You will use the dataframe index as the ìd parameter. This may or may not be acceptable. It may be that you need to identify this column in advance too, in which case you should set it as the dataframe index using .set_index() .

With that said:

import pandas as pd
import numpy as np
import json

data = pd.DataFrame(
    {
        'meta_1': np.random.choice(['A', 'B', 'C'], 10),
        'meta_2': np.random.choice(['Blue', 'Green', 'Red'], 10),
        'value': np.random.rand(10)
    }
)

print(data)

Here is the data:

    meta_1 meta_2     value
0      A    Red  0.095142
1      C    Red  0.855082
2      C   Blue  0.619704
3      B  Green  0.371495
4      A    Red  0.000771
5      B  Green  0.027218
6      B   Blue  0.655847
7      B   Blue  0.657976
8      A  Green  0.060862
9      C    Red  0.702788

Now set the column you want to use as the "value" column.

val_col_name = 'value'

Then a list comprehension with a nested dict comprehension:

json.dumps([{'id': i, 'metadata': {j: row[j] for j in data.columns if j != val_col_name}, val_col_name: row[val_col_name]} for i, row in data.iterrows()])

Gives:

[{"id": 0, "metadata": {"meta_1": "B", "meta_2": "Red"}, "value": 0.3169439789955154}, {"id": 1, "metadata": {"meta_1": "C", "meta_2": "Green"}, "value": 0.5672345948633107}, {"id": 2, "metadata": {"meta_1": "B", "meta_2": "Red"}, "value": 0.36909249143056766}, {"id": 3, "metadata": {"meta_1": "C", "meta_2": "Red"}, "value": 0.8033913639248945}, {"id": 4, "metadata": {"meta_1": "B", "meta_2": "Red"}, "value": 0.04500655943447107}, {"id": 5, "metadata": {"meta_1": "A", "meta_2": "Red"}, "value": 0.43388699497426875}, {"id": 6, "metadata": {"meta_1": "C", "meta_2": "Green"}, "value": 0.14265358049247878}, {"id": 7, "metadata": {"meta_1": "C", "meta_2": "Red"}, "value": 0.7823049064345722}, {"id": 8, "metadata": {"meta_1": "B", "meta_2": "Blue"}, "value": 0.9522025604707016}, {"id": 9, "metadata": {"meta_1": "C", "meta_2": "Red"}, "value": 0.3863207799791931}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM