在 Pandas 中操作数据框的数据

Question

I'm reading a dataframe and converting it into a json file.我正在读取数据帧并将其转换为 json 文件。 I'm using python 3 and 0.25.3 version of pandas for it.我正在使用 python 3 和 0.25.3 版本的熊猫。 I already got some help from you guys ( Manipulating data of Pandas dataframe ), but I have some questions about the code and how it works.我已经从你们那里得到了一些帮助（操作 Pandas 数据框的数据），但我对代码及其工作方式有一些疑问。

My dataframe:我的数据框：

id     label        id_customer     label_customer    part_number   number_client

6     Sao Paulo      CUST-99992         Brazil          7897           982

6     Sao Paulo      CUST-99992         Brazil          888            12

92    Hong Kong      CUST-88888         China           147            288

Code:代码：

import pandas as pd

data = pd.read_excel(path)

data[["part_number","number_client"]] = data[["part_number","number_client"]].astype(str)

f = lambda x: x.split('_')[0]

j =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
        .groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
        .to_json(orient='records'))

print (j)

Json I'm getting: Json我得到：

[{
        "id": 6,
        "label": "Sao Paulo",
        "Customer": [{
            "id": "CUST-99992",
            "label": "Brazil",
            "number": [{
                    "part": "7897",
                    "client": "982"
                },
                {
                    "part": "888",
                    "client": "12"
                }
            ]
        }]
    },
    {
        "id": 92,
        "label": "Hong Kong",
        "Customer": [{
            "id": "CUST-888888",
            "label": "China",
            "number": [{
                "part": "147",
                "client": "288"
            }]
        }]
    }
]

1st Question: lambda and apply function are spliting my columns' name when a _ is found.. That is just a piece of my dataframe and some columns I'd like to preserve the name.. eg: I want get part_number and number_client instead part and client in my json structure.第一个问题：当找到_时， lambda和apply函数正在拆分我的列的名称..那只是我的数据框的一部分和一些我想保留名称的列.. 例如：我想要得到part_number和number_client我的 json 结构中的part和client 。 How can I fix this?我怎样才能解决这个问题？

2nd Question: I can have different lists with the same key name.第二个问题：我可以有具有相同键名的不同列表。 Eg: In customer list I have part_number key, but I can also have the same name of key inside another list with another value.例如：在customer列表中，我有part_number键，但我也可以在另一个列表中使用相同名称的键和另一个值。 Eg: part_number inside test list.例如： test列表中的part_number 。

3rd Question: In my complete dataframe, I have a column called Additional_information when I have a simple text.第三个问题：在我的完整数据框中，当我有一个简单的文本时，我有一个名为Additional_information的列。 I have to get a structure like this:我必须得到这样的结构：

...

"Additional_information":[{
        {
          "text": "testing",
        }
        },
        {
         "text": "testing again",
        }
        ]

for a dataframe like this:对于这样的数据框：

id     label        id_customer     label_customer    part_number   number_client    Additional_information

6     Sao Paulo      CUST-99992         Brazil          7897           982           testing

6     Sao Paulo      CUST-99992         Brazil          7897           982           testing again

What should I change?我应该改变什么？

Answer 1

1st Question:第一个问题：

You can write custom function for rename, eg like:您可以编写自定义函数进行重命名，例如：

def f(x):
    vals = ['part_number', 'number_client']
    if x in vals:
        return x
    else:
        return x.split('_')[0]

2nd Question第二个问题

If I understand correctly keys in final json are created from columns of original Dataframe, and also by parameter name by reset_index of my solution.如果我理解正确，最终 json 中的键是从原始数据帧的列创建的，并且还通过我的解决方案的reset_index的参数name reset_index 。 If want some another logic for change keys (columns names) is possible change first solution.如果想要更改键（列名称）的其他逻辑，则可以更改第一个解决方案。

3rd Question第三个问题

In original solution is changed to_json to to_dict for possible modify final list of dict like append text info, for json is used json.dumps in last step:在原始解决方案to_json to_dict更改为to_dict以可能修改 dict 的最终列表，如附加text信息，因为 json 在最后一步中使用json.dumps ：

import json

def f(x):
    vals = ['part_number', 'number_client']
    if x in vals:
        return x
    else:
        return x.split('_')[0]

d =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
        .groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
        .to_dict(orient='records'))

#print (d)

d1 = (data[['Additional_information']].rename(columns={'Additional_information':'text'})
                                      .to_dict(orient='records'))
d1 = {'Additional_information':d1}
print (d1)
{'Additional_information': [{'text': 'testing'}, {'text': 'testing again'}]}

d.append(d1)
#print (d)

j = json.dumps(d)
#print (j)

在 Pandas 中操作数据框的数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-12-11 06:40:17

在 Pandas 中操作数据框的数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-12-11 06:40:17

解决方案1
1 已采纳 2019-12-11 06:40:17