I am new to Pandas and I am trying to convert a Pandas DataFrame to a custom nested JSON string (maybe write it to a file). I tried to use the built-in Pandas to_json()
function but it didn't quite work for me. I am posting a portion of my Pandas DF and what I want my final result to look like. Ideally, I would like to have the "id" key be populated be with whatever the index of the of the DF is. I think my goal here is to not worry about what the column names but rather have a way to programmatically turn the DF into a JSON string. I originally wrote a for loop that would iterate through every row and write the contents into a file but after some thinking, I believe that would be error-prone as most of the JSON serialization is handled manually. Any help would be appreciated. Thank you and sorry for the long post.
BarcodeSequence LinkerPrimerSequence BodySite Year Month Day Subject ReportedAntibioticUsage DaysSinceExperimentStart Description
#SampleID
L1S8 AGCTGACTAGTC GTGCCAGCMGCCGCGGTAA gut 2008.0 10.0 28.0 subject-1 Yes 0.0 subject-1.gut.2008-10-28
L1S57 ACACACTATGGC GTGCCAGCMGCCGCGGTAA gut 2009.0 1.0 20.0 subject-1 No 84.0 subject-1.gut.2009-1-20
L1S76 ACTACGTGTGGT GTGCCAGCMGCCGCGGTAA gut 2009.0 2.0 17.0 subject-1 No 112.0 subject-1.gut.2009-2-17
L1S105 AGTGCGATGCGT GTGCCAGCMGCCGCGGTAA gut 2009.0 3.0 17.0 subject-1 No 140.0 subject-1.gut.2009-3-17
L2S155 ACGATGCGACCA GTGCCAGCMGCCGCGGTAA left palm 2009.0 1.0 20.0 subject-1 No 84.0 subject-1.left-palm.2009-1-20
L2S175 AGCTATCCACGA GTGCCAGCMGCCGCGGTAA left palm 2009.0 2.0 17.0 subject-1 No 112.0 subject-1.left-palm.2009-2-17
L2S204 ATGCAGCTCAGT GTGCCAGCMGCCGCGGTAA left palm 2009.0 3.0 17.0 subject-1 No 140.0 subject-1.left-palm.2009-3-17
L2S222 CACGTGACATGT GTGCCAGCMGCCGCGGTAA left palm 2009.0 4.0 14.0 subject-1 No 168.0 subject-1.left-palm.2009-4-14
L3S242 ACAGTTGCGCGA GTGCCAGCMGCCGCGGTAA right palm 2008.0 10.0 28.0 subject-1 Yes 0.0 subject-1.right-palm.2008-10-28
L3S294 CACGACAGGCTA GTGCCAGCMGCCGCGGTAA right palm 2009.0
[
{
"id": "L1S8",
"metadata": {
"BarcodeSequence": "AGCTGACTAGTC",
"LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
"BodySite": "gut",
"Year": 2008.0,
"Month": 10.0,
"Day": 28.0,
"Subject": "subject-1",
"ReportedAntibioticUsage": "Yes",
"DaysSinceExperimentStart": 0.0,
"Description": "subject-1.gut.2008-10-28"
},
"sample_frequency": "7068.0"
},
{
"id": "L1S57",
"metadata": {
"BarcodeSequence": "ACACACTATGGC",
"LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
"BodySite": "gut",
"Year": 2009.0,
"Month": 1.0,
"Day": 20.0,
"Subject": "subject-1",
"ReportedAntibioticUsage": "No",
"DaysSinceExperimentStart": 84.0,
"Description": "subject-1.gut.2009-1-20"
},
"sample_frequency": "8756.0"
},
{
"id": "L1S76",
"metadata": {
"BarcodeSequence": "ACTACGTGTGGT",
"LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
"BodySite": "gut",
"Year": 2009.0,
"Month": 2.0,
"Day": 17.0,
"Subject": "subject-1",
"ReportedAntibioticUsage": "No",
"DaysSinceExperimentStart": 112.0,
"Description": "subject-1.gut.2009-2-17"
},
"sample_frequency": "7922.0"
},
{
"id": "L1S105",
"metadata": {
"BarcodeSequence": "AGTGCGATGCGT",
"LinkerPrimerSequence": "GTGCCAGCMGCCGCGGTAA",
"BodySite": "gut",
"Year": 2009.0,
"Month": 3.0,
"Day": 17.0,
"Subject": "subject-1",
"ReportedAntibioticUsage": "No",
"DaysSinceExperimentStart": 140.0,
"Description": "subject-1.gut.2009-3-17"
},
"sample_frequency": "7865.0"
}
]
Here is a way to build the JSON (sort of) dynamically. You still need to make a few assumptions though, and I am not sure if your use case will admit these:
value
, sample_frequency
would be the "value" column in your example dataframe. ìd
parameter. This may or may not be acceptable. It may be that you need to identify this column in advance too, in which case you should set it as the dataframe index using .set_index()
. With that said:
import pandas as pd
import numpy as np
import json
data = pd.DataFrame(
{
'meta_1': np.random.choice(['A', 'B', 'C'], 10),
'meta_2': np.random.choice(['Blue', 'Green', 'Red'], 10),
'value': np.random.rand(10)
}
)
print(data)
Here is the data:
meta_1 meta_2 value
0 A Red 0.095142
1 C Red 0.855082
2 C Blue 0.619704
3 B Green 0.371495
4 A Red 0.000771
5 B Green 0.027218
6 B Blue 0.655847
7 B Blue 0.657976
8 A Green 0.060862
9 C Red 0.702788
Now set the column you want to use as the "value" column.
val_col_name = 'value'
Then a list comprehension with a nested dict comprehension:
json.dumps([{'id': i, 'metadata': {j: row[j] for j in data.columns if j != val_col_name}, val_col_name: row[val_col_name]} for i, row in data.iterrows()])
Gives:
[{"id": 0, "metadata": {"meta_1": "B", "meta_2": "Red"}, "value": 0.3169439789955154}, {"id": 1, "metadata": {"meta_1": "C", "meta_2": "Green"}, "value": 0.5672345948633107}, {"id": 2, "metadata": {"meta_1": "B", "meta_2": "Red"}, "value": 0.36909249143056766}, {"id": 3, "metadata": {"meta_1": "C", "meta_2": "Red"}, "value": 0.8033913639248945}, {"id": 4, "metadata": {"meta_1": "B", "meta_2": "Red"}, "value": 0.04500655943447107}, {"id": 5, "metadata": {"meta_1": "A", "meta_2": "Red"}, "value": 0.43388699497426875}, {"id": 6, "metadata": {"meta_1": "C", "meta_2": "Green"}, "value": 0.14265358049247878}, {"id": 7, "metadata": {"meta_1": "C", "meta_2": "Red"}, "value": 0.7823049064345722}, {"id": 8, "metadata": {"meta_1": "B", "meta_2": "Blue"}, "value": 0.9522025604707016}, {"id": 9, "metadata": {"meta_1": "C", "meta_2": "Red"}, "value": 0.3863207799791931}]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.