將復雜的字典存儲在pandas數據框中

Question

這個問題跟着我的上一個問題。這是pandas dataframe中存儲字典之前的一個的母字典

我有一本字典

  dictionary_example={'New York':{1234:{'choice':0,'city':'New York','choice_set':{0:{'A':100,'B':200,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
   234:{'choice':1,'city':'New York','choice_set':{0:{'A':100,'B':400},1:{'A':100,'B':300,'C':1000}}},
   1876:{'choice':2,'city':'New York','choice_set':{0:{'A': 100,'B':400,'C':300},1:{'A':100,'B':300,'C':1000},2:{'A':600,'B':200,'C':100}}
  }},
    'London':{1534:{'choice':0,'city':'London','choice_set':{0:{'A':100,'B':400,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},  
   2134:{'choice':1,'city':'London','choice_set':{0:{'A':100,'B':600},1:{'A':170,'B':300,'C':1000}}},
   1776:{'choice':2,'city':'London','choice_set':{0:{'A':100,'B':400,'C':500},1:{'A':100,'B':300},2:{'A':600,'B':200,'C':100}}}},

    'Paris':{1534:{'choice':0,'city':'Paris','choice_set':{0:{'A':100,'B':400,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
   2134:{'choice':1,'city':'Paris','choice_set':{0:{'A':100,'B':600},1:{'A':170,'B':300,'C':1000}}},
   1776:{'choice':1,'city':'Paris','choice_set':{0:{'A': 100,'B':400,'C':500},1:{'A':100,'B':300}}}
  }}

我希望它成為這樣的熊貓數據框（其中的某些特定值可能不完全准確）

id choice  A_0  B_0  C_0  A_1  B_1  C_1  A_2  B_2  C_2 New York London Paris
1234  0     100  200 300  200  300  300  500  300  300    1      0      0
234  1      100  400  -   100  300  1000  -    -    -    1       0      0
1876  2     100  400  300  100  300  1000 600 200 100    1      0       0
1534  0     100  200 300  200  300  300  500  300  300    0      1      0
2134  1      100  400  -   100  300  1000  -    -    -    0       1      0
2006  2     100  400  300  100  300  1000 600 200 100    0      1       0
1264  0     100  200 300  200  300  300  500  300  300    0      0      1
1454  1      100  400  -   100  300  1000  -    -    -    0      0      1
1776  1     100  400  300  100  300     -   -    -    -   0      0       1

在舊問題中，好人為sub_dictionary提供了一種方法：

df = pd.read_json(json.dumps(dictionary_example)).T


def to_s(r):
    return pd.read_json(json.dumps(r)).unstack()

flattened_choice_set = df["choice_set"].apply(to_s)

flattened_choice_set.columns = ['_'.join((str(col[0]), col[1])) for col in flattened_choice_set.columns] 

result = pd.merge(df, flattened_choice_set, 
         left_index=True, right_index=True).drop("choice_set", axis=1)

大型詞典有什么辦法嗎？

祝一切順利，凱文

Answer 1

正如您所引用的，以前提供的解決方案不是一個很好的解決方案。 該代碼更具可讀性，可以為您當前的問題提供解決方案。 如果可能的話，您應該重新考慮數據結構...

df = pd.DataFrame()
question_ids = [0,1,2]

為每個城市選擇組合創建一個數據行，並在選擇集列中添加字典

for _, city_value in dictionary_example.iteritems():
    city_df = pd.DataFrame.from_dict(city_value).T
    city_df = city_df.join(pd.DataFrame(city_df["choice_set"].to_dict()).T)
    df = df.append(city_df)

從選擇集中將怪異的列名稱連接到df

for i in question_ids:
    choice_df = pd.DataFrame(df[i].to_dict()).T
    choice_df.columns = map(lambda x: "{}_{}".format(x,i), choice_df.columns)
    df = df.join(choice_df)

修復城市列

df = pd.get_dummies(df, prefix="", prefix_sep="", columns=['city'])
df.drop(question_ids + ['choice_set'], axis=1, inplace=True)
# Optional to remove NaN from questions:
# df = df.fillna(0)
df

將復雜的字典存儲在pandas數據框中

問題描述

1 個解決方案

解決方案1
2 已采納 2016-09-13 13:28:25

將復雜的字典存儲在pandas數據框中

問題描述

1 個解決方案

解決方案1 2 已采納 2016-09-13 13:28:25

解決方案1
2 已采納 2016-09-13 13:28:25