I want to convert JSON file into proper format. I have a JSON file as given below:
{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"details":"|seedless:true|,|condition:New|"
},
{
"fruit": "Almond",
"size": "small",
"color": "brown",
"details":"|Type:dry|,|seedless:true|,|condition:New|"
}
You can see the data in the details can vary.
I want to change it into :
{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"seedless":"true",
"condition":"New",
},
{
"fruit": "Almond",
"size": "small",
"color": "brown",
"Type":"dry",
"seedless":"true",
"condition":"New",
}
I have tried doing it in python using pandas as:
import json
import pandas as pd
import re
df = pd.read_json("data.json",lines=True)
#I tried to change the pattern of data in details column as
re1 = re.compile('r/|(.?):(.?)|/')
re2 = re.compile('r\"(.*?)\":\"(.*?)\"')
df.replace({'details' :re1}, {'details' : re2},inplace = True, regex = True);
But that giving output as "objects" in all the rows of details column.
Try this,
for d in data:
details = d.pop('details')
d.update(dict(x.split(":") for x in details.split("|") if ":" in x))
print(data)
[{'color': 'Red',
'condition': 'New',
'fruit': 'Apple',
'seedless': 'true',
'size': 'Large'},
{'Type': 'dry',
'color': 'brown',
'condition': 'New',
'fruit': 'Almond',
'seedless': 'true',
'size': 'small'}]
You can convert the (list of) dictionaries to a pandas data frame.
import pandas as pd
# data is a list of dictionaries
data = [{
"fruit": "Apple",
"size": "Large",
"color": "Red",
"details":"|seedless:true|,|condition:New|"
},
{
"fruit": "Almond",
"size": "small",
"color": "brown",
"details":"|Type:dry,|seedless:true|,|condition:New|"
}]
# convert to data frame
df = pd.DataFrame(data)
# remove '|' from details and convert to list
df['details'] = df['details'].str.replace(r'\|', '').str.split(',')
# explode list => one row for each element
df = df.explode('details')
# split details into name/value pair
df[['name', 'value']] = df['details'].str.split(':').apply(lambda x: pd.Series(x))
# drop details column
df = df.drop(columns='details')
print(df)
fruit size color name value
0 Apple Large Red seedless true
0 Apple Large Red condition New
1 Almond small brown Type dry
1 Almond small brown seedless true
1 Almond small brown condition New
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.