I'm reading a dataframe and converting it into a json file. I'm using python 3 and 0.25.3 version of pandas for it. I already got some help from you guys ( Manipulating data of Pandas dataframe ), but I have some questions about the code and how it works.
My dataframe:
id label id_customer label_customer part_number number_client
6 Sao Paulo CUST-99992 Brazil 7897 982
6 Sao Paulo CUST-99992 Brazil 888 12
92 Hong Kong CUST-88888 China 147 288
Code:
import pandas as pd
data = pd.read_excel(path)
data[["part_number","number_client"]] = data[["part_number","number_client"]].astype(str)
f = lambda x: x.split('_')[0]
j =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
.groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
.to_json(orient='records'))
print (j)
Json I'm getting:
[{
"id": 6,
"label": "Sao Paulo",
"Customer": [{
"id": "CUST-99992",
"label": "Brazil",
"number": [{
"part": "7897",
"client": "982"
},
{
"part": "888",
"client": "12"
}
]
}]
},
{
"id": 92,
"label": "Hong Kong",
"Customer": [{
"id": "CUST-888888",
"label": "China",
"number": [{
"part": "147",
"client": "288"
}]
}]
}
]
1st Question: lambda
and apply
function are spliting my columns' name when a _
is found.. That is just a piece of my dataframe and some columns I'd like to preserve the name.. eg: I want get part_number
and number_client
instead part
and client
in my json structure. How can I fix this?
2nd Question: I can have different lists with the same key name. Eg: In customer
list I have part_number
key, but I can also have the same name of key inside another list with another value. Eg: part_number
inside test
list.
3rd Question: In my complete dataframe, I have a column called Additional_information
when I have a simple text. I have to get a structure like this:
...
"Additional_information":[{
{
"text": "testing",
}
},
{
"text": "testing again",
}
]
for a dataframe like this:
id label id_customer label_customer part_number number_client Additional_information
6 Sao Paulo CUST-99992 Brazil 7897 982 testing
6 Sao Paulo CUST-99992 Brazil 7897 982 testing again
What should I change?
1st Question:
You can write custom function for rename, eg like:
def f(x):
vals = ['part_number', 'number_client']
if x in vals:
return x
else:
return x.split('_')[0]
2nd Question
If I understand correctly keys in final json are created from columns of original Dataframe, and also by parameter name
by reset_index
of my solution. If want some another logic for change keys (columns names) is possible change first solution.
3rd Question
In original solution is changed to_json
to to_dict
for possible modify final list of dict like append text
info, for json is used json.dumps
in last step:
import json
def f(x):
vals = ['part_number', 'number_client']
if x in vals:
return x
else:
return x.split('_')[0]
d =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
.groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
.to_dict(orient='records'))
#print (d)
d1 = (data[['Additional_information']].rename(columns={'Additional_information':'text'})
.to_dict(orient='records'))
d1 = {'Additional_information':d1}
print (d1)
{'Additional_information': [{'text': 'testing'}, {'text': 'testing again'}]}
d.append(d1)
#print (d)
j = json.dumps(d)
#print (j)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.