Manipulating data of a dataframe in Pandas

Question

I'm reading a dataframe and converting it into a json file. I'm using python 3 and 0.25.3 version of pandas for it. I already got some help from you guys ( Manipulating data of Pandas dataframe ), but I have some questions about the code and how it works.

My dataframe:

id     label        id_customer     label_customer    part_number   number_client

6     Sao Paulo      CUST-99992         Brazil          7897           982

6     Sao Paulo      CUST-99992         Brazil          888            12

92    Hong Kong      CUST-88888         China           147            288

Code:

import pandas as pd

data = pd.read_excel(path)

data[["part_number","number_client"]] = data[["part_number","number_client"]].astype(str)

f = lambda x: x.split('_')[0]

j =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
        .groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
        .to_json(orient='records'))

print (j)

Json I'm getting:

[{
        "id": 6,
        "label": "Sao Paulo",
        "Customer": [{
            "id": "CUST-99992",
            "label": "Brazil",
            "number": [{
                    "part": "7897",
                    "client": "982"
                },
                {
                    "part": "888",
                    "client": "12"
                }
            ]
        }]
    },
    {
        "id": 92,
        "label": "Hong Kong",
        "Customer": [{
            "id": "CUST-888888",
            "label": "China",
            "number": [{
                "part": "147",
                "client": "288"
            }]
        }]
    }
]

1st Question: lambda and apply function are spliting my columns' name when a _ is found.. That is just a piece of my dataframe and some columns I'd like to preserve the name.. eg: I want get part_number and number_client instead part and client in my json structure. How can I fix this?

2nd Question: I can have different lists with the same key name. Eg: In customer list I have part_number key, but I can also have the same name of key inside another list with another value. Eg: part_number inside test list.

3rd Question: In my complete dataframe, I have a column called Additional_information when I have a simple text. I have to get a structure like this:

...

"Additional_information":[{
        {
          "text": "testing",
        }
        },
        {
         "text": "testing again",
        }
        ]

for a dataframe like this:

id     label        id_customer     label_customer    part_number   number_client    Additional_information

6     Sao Paulo      CUST-99992         Brazil          7897           982           testing

6     Sao Paulo      CUST-99992         Brazil          7897           982           testing again

What should I change?

Answer 1

1st Question:

You can write custom function for rename, eg like:

def f(x):
    vals = ['part_number', 'number_client']
    if x in vals:
        return x
    else:
        return x.split('_')[0]

2nd Question

If I understand correctly keys in final json are created from columns of original Dataframe, and also by parameter name by reset_index of my solution. If want some another logic for change keys (columns names) is possible change first solution.

3rd Question

In original solution is changed to_json to to_dict for possible modify final list of dict like append text info, for json is used json.dumps in last step:

import json

def f(x):
    vals = ['part_number', 'number_client']
    if x in vals:
        return x
    else:
        return x.split('_')[0]

d =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
        .groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
        .to_dict(orient='records'))

#print (d)

d1 = (data[['Additional_information']].rename(columns={'Additional_information':'text'})
                                      .to_dict(orient='records'))
d1 = {'Additional_information':d1}
print (d1)
{'Additional_information': [{'text': 'testing'}, {'text': 'testing again'}]}

d.append(d1)
#print (d)

j = json.dumps(d)
#print (j)

Manipulating data of a dataframe in Pandas

Question

1 answers

solution1
1 ACCPTED 2019-12-11 06:40:17

Manipulating data of a dataframe in Pandas

Question

1 answers

solution1 1 ACCPTED 2019-12-11 06:40:17

solution1
1 ACCPTED 2019-12-11 06:40:17