简体   繁体   English

从嵌套字典创建 Pandas Dataframe

[英]Create a Pandas Dataframe from nested dict

I have a nested dict with following structure: course_id, nested dict with: 2 recommended courses and number of purchases for every course.我有一个具有以下结构的嵌套字典:course_id,嵌套字典:2 个推荐的课程和每门课程的购买次数。 For example entries of this dict look smth like this:例如,这个 dict 的条目看起来像这样:

 {490: {566: 253, 551: 247},
 357: {571: 112, 356: 100},
 507: {570: 172, 752: 150}}

I tried this code to make a dataframe from this dict:我尝试使用此代码从该字典中制作 dataframe :

result=pd.DataFrame.from_dict(dicts, orient='index').stack().reset_index()
result.columns=['Course ID','Recommended course','Number of purchases']

请。查看输出

This doesn't quite work for me, because I want an output where there will be 5 columns.这对我来说不太适用,因为我想要一个有 5 列的 output。 Course ID, recommended course 1, purchases 1, recommended course 2, purchases 2. Is there any solution for this?课程ID,推荐课程1,购买1,推荐课程2,购买2。这个有什么解决办法吗? Thanks in advance.提前致谢。

I would recommend you just re-shape your dictionary then re-create your dataframe, however you're not far off from getting your target output from your current dataframe.我建议你重新塑造你的字典,然后重新创建你的 dataframe,但是你离目标 output 不远了,你现在的 Z6A8064B5DF479455500553C47C55057

we can groupby and use cumcount to create our unique column then unstack and assign our column from the multi index header that was created.我们可以cumcount groupby我们唯一的列,然后从创建的多索引 header 中取消unstack并分配我们的列。

s1 = result.groupby(['Course ID',
             result.groupby(['Course ID']).cumcount() + 1]).first().unstack()

s1.columns = [f"{x}_{y}" for x,y in s1.columns]


              Recommended course_1  Recommended course_2  Number of purchases_1  \
Course ID                                                                      
357                         571                   356                  112.0   
490                         566                   551                  253.0   
507                         570                   752                  172.0   

           Number of purchases_2  
Course ID                         
357                        100.0  
490                        247.0  
507                        150.0

Not an efficient one, but should work in your case:-不是一个有效的,但应该适用于你的情况: -

df = pd.DataFrame([(k,list(v.keys())[0],list(v.values())[0],list(v.keys())[1],list(v.values())[1]) for k,v in a.items()], columns = ['Course ID','Recommended course 1','purchases 1', 'Recommended Course 2', 'purchases 2'])
print(df)

Output:- Output:-

   Course ID  Recommended course 1  purchases 1  Recommended Course 2  \
0        490                   566          253                   551
1        357                   571          112                   356
2        507                   570          172                   752

   purchases 2
0          247
1          100
2          150

You can use itertools chain to convert the nested dict into a flat list of key, value pairs, and store into a dictionary d2 using dictionary comprehension where the keys are the course id, and then proceed with forming the dataframe using pandas.您可以使用 itertools 链将嵌套 dict 转换为键、值对的平面列表,并使用键是课程 ID 的字典理解存储到字典d2中,然后使用 pandas 继续形成 dataframe。

import pandas as pd
from itertools import chain

d = {
    490: {566: 253, 551: 247},
    357: {571: 112, 356: 100},
    507: {570: 172, 752: 150}
}

d2 = {k: list(chain.from_iterable(v.items())) for k, v in d.items()}
df = pd.DataFrame.from_dict(d2, orient='index').reset_index()
df.columns = ['id','rec_course1', 'n_purch_1', 'rec_course2', 'n_purch_2']

df df

    id   rec_course1  n_purch_1  rec_course2  n_purch_2
0  490           566        253          551        247
1  357           571        112          356        100
2  507           570        172          752        150

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM