[英]Manipulating data of a dataframe in Pandas
I'm reading a dataframe and converting it into a json file.我正在读取数据帧并将其转换为 json 文件。 I'm using python 3 and 0.25.3 version of pandas for it.
我正在使用 python 3 和 0.25.3 版本的熊猫。 I already got some help from you guys ( Manipulating data of Pandas dataframe ), but I have some questions about the code and how it works.
我已经从你们那里得到了一些帮助( 操作 Pandas 数据框的数据),但我对代码及其工作方式有一些疑问。
My dataframe:我的数据框:
id label id_customer label_customer part_number number_client
6 Sao Paulo CUST-99992 Brazil 7897 982
6 Sao Paulo CUST-99992 Brazil 888 12
92 Hong Kong CUST-88888 China 147 288
Code:代码:
import pandas as pd
data = pd.read_excel(path)
data[["part_number","number_client"]] = data[["part_number","number_client"]].astype(str)
f = lambda x: x.split('_')[0]
j =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
.groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
.to_json(orient='records'))
print (j)
Json I'm getting: Json我得到:
[{
"id": 6,
"label": "Sao Paulo",
"Customer": [{
"id": "CUST-99992",
"label": "Brazil",
"number": [{
"part": "7897",
"client": "982"
},
{
"part": "888",
"client": "12"
}
]
}]
},
{
"id": 92,
"label": "Hong Kong",
"Customer": [{
"id": "CUST-888888",
"label": "China",
"number": [{
"part": "147",
"client": "288"
}]
}]
}
]
1st Question: lambda
and apply
function are spliting my columns' name when a _
is found.. That is just a piece of my dataframe and some columns I'd like to preserve the name.. eg: I want get part_number
and number_client
instead part
and client
in my json structure.第一个问题:当找到
_
时, lambda
和apply
函数正在拆分我的列的名称..那只是我的数据框的一部分和一些我想保留名称的列.. 例如:我想要得到part_number
和number_client
我的 json 结构中的part
和client
。 How can I fix this?我怎样才能解决这个问题?
2nd Question: I can have different lists with the same key name.第二个问题:我可以有具有相同键名的不同列表。 Eg: In
customer
list I have part_number
key, but I can also have the same name of key inside another list with another value.例如:在
customer
列表中,我有part_number
键,但我也可以在另一个列表中使用相同名称的键和另一个值。 Eg: part_number
inside test
list.例如:
test
列表中的part_number
。
3rd Question: In my complete dataframe, I have a column called Additional_information
when I have a simple text.第三个问题:在我的完整数据框中,当我有一个简单的文本时,我有一个名为
Additional_information
的列。 I have to get a structure like this:我必须得到这样的结构:
...
"Additional_information":[{
{
"text": "testing",
}
},
{
"text": "testing again",
}
]
for a dataframe like this:对于这样的数据框:
id label id_customer label_customer part_number number_client Additional_information
6 Sao Paulo CUST-99992 Brazil 7897 982 testing
6 Sao Paulo CUST-99992 Brazil 7897 982 testing again
What should I change?我应该改变什么?
1st Question:第一个问题:
You can write custom function for rename, eg like:您可以编写自定义函数进行重命名,例如:
def f(x):
vals = ['part_number', 'number_client']
if x in vals:
return x
else:
return x.split('_')[0]
2nd Question第二个问题
If I understand correctly keys in final json are created from columns of original Dataframe, and also by parameter name
by reset_index
of my solution.如果我理解正确,最终 json 中的键是从原始数据帧的列创建的,并且还通过我的解决方案的
reset_index
的参数name
reset_index
。 If want some another logic for change keys (columns names) is possible change first solution.如果想要更改键(列名称)的其他逻辑,则可以更改第一个解决方案。
3rd Question第三个问题
In original solution is changed to_json
to to_dict
for possible modify final list of dict like append text
info, for json is used json.dumps
in last step:在原始解决方案
to_json
to_dict
更改为to_dict
以可能修改 dict 的最终列表,如附加text
信息,因为 json 在最后一步中使用json.dumps
:
import json
def f(x):
vals = ['part_number', 'number_client']
if x in vals:
return x
else:
return x.split('_')[0]
d =(data.groupby(["id","label","id_customer","label_customer"])['part_number','number_client']
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Number')
.groupby(["id", "label"])[ "id_customer", "label_customer", "Number"]
.apply(lambda x: x.rename(columns=f).to_dict('r')).reset_index(name='Customer')
.to_dict(orient='records'))
#print (d)
d1 = (data[['Additional_information']].rename(columns={'Additional_information':'text'})
.to_dict(orient='records'))
d1 = {'Additional_information':d1}
print (d1)
{'Additional_information': [{'text': 'testing'}, {'text': 'testing again'}]}
d.append(d1)
#print (d)
j = json.dumps(d)
#print (j)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.