[英]How to reshape dataframe in order to get the desired output in python?
[英]reshape nested json data in a dataframe using python to get desired output
嗨,我正在尝试使用 pandas 在 dataframe 中重塑此 json 数据。
id categories
1 3ee877e0 [{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]
预期结果
id entity_def_id permalink uuid value
3ee877e0 category blockchain 1fea6201 Blockchain
3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3ee877e0 category marketplace-772d 772da8fe Marketplace
3ee877e0 category software c08b5441 Software
很抱歉没有发布我这样做的尝试,但我是 python 的新手,并且已经知道如何在 mongodb 和 dataiku 中进行操作,只是想知道使用 python 的方法
您可以尝试explode
categories
列,然后将categories
列中的字典转换为多列
out = (df.assign(categories=df['categories'].apply(eval))
.explode('categories', ignore_index=True)
.pipe(lambda df: df.join(pd.DataFrame(df.pop('categories').values.tolist()))))
print(out)
id entity_def_id permalink uuid value
0 3ee877e0 category blockchain 1fea6201 Blockchain
1 3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
2 3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3 3ee877e0 category marketplace-772d 772da8fe Marketplace
4 3ee877e0 category software c08b5441 Software
您可以获取categories
中的字典列表,按原样将其传递给DataFrame()
,然后使用insert
插入您的 id 列
import pandas as pd
current_df = pd.DataFrame({"id": "3ee877e0","categories":[[{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]]})
id_ = current_df.loc[:,"id"].values[0]
categories = current_df.loc[:,"categories"].values[0]
new_df = pd.DataFrame(categories )
new_df.insert(0,"id", id_ )
结果
id entity_def_id permalink uuid value
0 3ee877e0 category blockchain 1fea6201 Blockchain
1 3ee877e0 category cryptocurrency bd082f4d Cryptocurrency
2 3ee877e0 category loyalty-programs 4a45af54 Loyalty Programs
3 3ee877e0 category marketplace-772d 772da8fe Marketplace
4 3ee877e0 category software c08b5441 Software
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.