繁体   English   中英

使用 python 重塑 dataframe 中的嵌套 json 数据以获得所需的 Z78E6221F6393D1456681DBCEZF

[英]reshape nested json data in a dataframe using python to get desired output

嗨,我正在尝试使用 pandas 在 dataframe 中重塑此 json 数据。

      id        categories
1     3ee877e0  [{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]

预期结果

id        entity_def_id  permalink         uuid        value
3ee877e0  category       blockchain        1fea6201    Blockchain
3ee877e0  category       cryptocurrency    bd082f4d    Cryptocurrency
3ee877e0  category       loyalty-programs  4a45af54    Loyalty Programs
3ee877e0  category       marketplace-772d  772da8fe    Marketplace
3ee877e0  category       software          c08b5441    Software

很抱歉没有发布我这样做的尝试,但我是 python 的新手,并且已经知道如何在 mongodb 和 dataiku 中进行操作,只是想知道使用 python 的方法

您可以尝试explode categories列,然后将categories列中的字典转换为多列

out = (df.assign(categories=df['categories'].apply(eval))
       .explode('categories', ignore_index=True)
       .pipe(lambda df: df.join(pd.DataFrame(df.pop('categories').values.tolist()))))
print(out)

         id entity_def_id         permalink      uuid             value
0  3ee877e0      category        blockchain  1fea6201        Blockchain
1  3ee877e0      category    cryptocurrency  bd082f4d    Cryptocurrency
2  3ee877e0      category  loyalty-programs  4a45af54  Loyalty Programs
3  3ee877e0      category  marketplace-772d  772da8fe       Marketplace
4  3ee877e0      category          software  c08b5441          Software

您可以获取categories中的字典列表,按原样将其传递给DataFrame() ,然后使用insert插入您的 id 列

import pandas as pd

current_df = pd.DataFrame({"id": "3ee877e0","categories":[[{"entity_def_id":"category","permalink":"blockchain","uuid":"1fea6201","value":"Blockchain"},{"entity_def_id":"category","permalink":"cryptocurrency","uuid":"bd082f4d","value":"Cryptocurrency"},{"entity_def_id":"category","permalink":"loyalty-programs","uuid":"4a45af54","value":"Loyalty Programs"},{"entity_def_id":"category","permalink":"marketplace-772d","uuid":"772da8fe","value":"Marketplace"},{"entity_def_id":"category","permalink":"software","uuid":"c08b5441","value":"Software"}]]})

id_ = current_df.loc[:,"id"].values[0]
categories = current_df.loc[:,"categories"].values[0]

new_df = pd.DataFrame(categories )
new_df.insert(0,"id", id_ )

结果

         id entity_def_id         permalink      uuid             value
0  3ee877e0      category        blockchain  1fea6201        Blockchain
1  3ee877e0      category    cryptocurrency  bd082f4d    Cryptocurrency
2  3ee877e0      category  loyalty-programs  4a45af54  Loyalty Programs
3  3ee877e0      category  marketplace-772d  772da8fe       Marketplace
4  3ee877e0      category          software  c08b5441          Software

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM