简体   繁体   English

在 Pandas 中操作数据框的数据

[英]Manipulating data of a dataframe in Pandas

I'm reading a dataframe and converting it into a json file.我正在读取数据帧并将其转换为 json 文件。 I'm using python 3 and 0.25.3 version of pandas for it.我正在使用 python 3 和 0.25.3 版本的熊猫。 I already got some help from you guys ( Manipulating data of Pandas dataframe ), but I have some questions about the code and how it works.我已经从你们那里得到了一些帮助( 操作 Pandas 数据框的数据),但我对代码及其工作方式有一些疑问。

My dataframe:我的数据框:

id     label        id_customer     label_customer    part_number   number_client

6     Sao Paulo      CUST-99992         Brazil          7897           982

6     Sao Paulo      CUST-99992         Brazil          888            12

92    Hong Kong      CUST-88888         China           147            288

Code:代码:

import pandas as pd

data = pd.read_excel(path)

data[["part_number","number_client"]] = data[["part_number","number_client"]].astype(str)

f = lambda x: x.split('_')[0]

j =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
        .groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
        .to_json(orient='records'))

print (j)

Json I'm getting: Json我得到:

[{
        "id": 6,
        "label": "Sao Paulo",
        "Customer": [{
            "id": "CUST-99992",
            "label": "Brazil",
            "number": [{
                    "part": "7897",
                    "client": "982"
                },
                {
                    "part": "888",
                    "client": "12"
                }
            ]
        }]
    },
    {
        "id": 92,
        "label": "Hong Kong",
        "Customer": [{
            "id": "CUST-888888",
            "label": "China",
            "number": [{
                "part": "147",
                "client": "288"
            }]
        }]
    }
]

1st Question: lambda and apply function are spliting my columns' name when a _ is found.. That is just a piece of my dataframe and some columns I'd like to preserve the name.. eg: I want get part_number and number_client instead part and client in my json structure.第一个问题:当找到_时, lambdaapply函数正在拆分我的列的名称..那只是我的数据框的一部分和一些我想保留名称的列.. 例如:我想要得到part_numbernumber_client我的 json 结构中的partclient How can I fix this?我怎样才能解决这个问题?

2nd Question: I can have different lists with the same key name.第二个问题:我可以有具有相同键名的不同列表。 Eg: In customer list I have part_number key, but I can also have the same name of key inside another list with another value.例如:在customer列表中,我有part_number键,但我也可以在另一个列表中使用相同名称的键和另一个值。 Eg: part_number inside test list.例如: test列表中的part_number

3rd Question: In my complete dataframe, I have a column called Additional_information when I have a simple text.第三个问题:在我的完整数据框中,当我有一个简单的文本时,我有一个名为Additional_information的列。 I have to get a structure like this:我必须得到这样的结构:

...

"Additional_information":[{
        {
          "text": "testing",
        }
        },
        {
         "text": "testing again",
        }
        ]

for a dataframe like this:对于这样的数据框:

id     label        id_customer     label_customer    part_number   number_client    Additional_information

6     Sao Paulo      CUST-99992         Brazil          7897           982           testing

6     Sao Paulo      CUST-99992         Brazil          7897           982           testing again

What should I change?我应该改变什么?

1st Question:第一个问题:

You can write custom function for rename, eg like:您可以编写自定义函数进行重命名,例如:

def f(x):
    vals = ['part_number', 'number_client']
    if x in vals:
        return x
    else:
        return x.split('_')[0]

2nd Question第二个问题

If I understand correctly keys in final json are created from columns of original Dataframe, and also by parameter name by reset_index of my solution.如果我理解正确,最终 json 中的键是从原始数据帧的列创建的,并且还通过我的解决方案的reset_index的参数name reset_index If want some another logic for change keys (columns names) is possible change first solution.如果想要更改键(列名称)的其他逻辑,则可以更改第一个解决方案。

3rd Question第三个问题

In original solution is changed to_json to to_dict for possible modify final list of dict like append text info, for json is used json.dumps in last step:在原始解决方案to_json to_dict更改为to_dict以可能修改 dict 的最终列表,如附加text信息,因为 json 在最后一步中使用json.dumps

import json

def f(x):
    vals = ['part_number', 'number_client']
    if x in vals:
        return x
    else:
        return x.split('_')[0]

d =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
        .groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
        .apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
        .to_dict(orient='records'))

#print (d)

d1 = (data[['Additional_information']].rename(columns={'Additional_information':'text'})
                                      .to_dict(orient='records'))
d1 = {'Additional_information':d1}
print (d1)
{'Additional_information': [{'text': 'testing'}, {'text': 'testing again'}]}

d.append(d1)
#print (d)

j = json.dumps(d)
#print (j)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM