[英]store complex dictionary in pandas dataframe
這個問題跟着我的上一個問題。這是pandas dataframe中存儲字典之前的一個的母字典
我有一本字典
dictionary_example={'New York':{1234:{'choice':0,'city':'New York','choice_set':{0:{'A':100,'B':200,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
234:{'choice':1,'city':'New York','choice_set':{0:{'A':100,'B':400},1:{'A':100,'B':300,'C':1000}}},
1876:{'choice':2,'city':'New York','choice_set':{0:{'A': 100,'B':400,'C':300},1:{'A':100,'B':300,'C':1000},2:{'A':600,'B':200,'C':100}}
}},
'London':{1534:{'choice':0,'city':'London','choice_set':{0:{'A':100,'B':400,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
2134:{'choice':1,'city':'London','choice_set':{0:{'A':100,'B':600},1:{'A':170,'B':300,'C':1000}}},
1776:{'choice':2,'city':'London','choice_set':{0:{'A':100,'B':400,'C':500},1:{'A':100,'B':300},2:{'A':600,'B':200,'C':100}}}},
'Paris':{1534:{'choice':0,'city':'Paris','choice_set':{0:{'A':100,'B':400,'C':300},1:{'A':200,'B':300,'C':300},2:{'A':500,'B':300,'C':300}}},
2134:{'choice':1,'city':'Paris','choice_set':{0:{'A':100,'B':600},1:{'A':170,'B':300,'C':1000}}},
1776:{'choice':1,'city':'Paris','choice_set':{0:{'A': 100,'B':400,'C':500},1:{'A':100,'B':300}}}
}}
我希望它成為這樣的熊貓數據框(其中的某些特定值可能不完全准確)
id choice A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2 New York London Paris
1234 0 100 200 300 200 300 300 500 300 300 1 0 0
234 1 100 400 - 100 300 1000 - - - 1 0 0
1876 2 100 400 300 100 300 1000 600 200 100 1 0 0
1534 0 100 200 300 200 300 300 500 300 300 0 1 0
2134 1 100 400 - 100 300 1000 - - - 0 1 0
2006 2 100 400 300 100 300 1000 600 200 100 0 1 0
1264 0 100 200 300 200 300 300 500 300 300 0 0 1
1454 1 100 400 - 100 300 1000 - - - 0 0 1
1776 1 100 400 300 100 300 - - - - 0 0 1
在舊問題中,好人為sub_dictionary提供了一種方法:
df = pd.read_json(json.dumps(dictionary_example)).T
def to_s(r):
return pd.read_json(json.dumps(r)).unstack()
flattened_choice_set = df["choice_set"].apply(to_s)
flattened_choice_set.columns = ['_'.join((str(col[0]), col[1])) for col in flattened_choice_set.columns]
result = pd.merge(df, flattened_choice_set,
left_index=True, right_index=True).drop("choice_set", axis=1)
大型詞典有什么辦法嗎?
祝一切順利,凱文
正如您所引用的,以前提供的解決方案不是一個很好的解決方案。 該代碼更具可讀性,可以為您當前的問題提供解決方案。 如果可能的話,您應該重新考慮數據結構...
df = pd.DataFrame()
question_ids = [0,1,2]
為每個城市選擇組合創建一個數據行,並在選擇集列中添加字典
for _, city_value in dictionary_example.iteritems():
city_df = pd.DataFrame.from_dict(city_value).T
city_df = city_df.join(pd.DataFrame(city_df["choice_set"].to_dict()).T)
df = df.append(city_df)
從選擇集中將怪異的列名稱連接到df
for i in question_ids:
choice_df = pd.DataFrame(df[i].to_dict()).T
choice_df.columns = map(lambda x: "{}_{}".format(x,i), choice_df.columns)
df = df.join(choice_df)
修復城市列
df = pd.get_dummies(df, prefix="", prefix_sep="", columns=['city'])
df.drop(question_ids + ['choice_set'], axis=1, inplace=True)
# Optional to remove NaN from questions:
# df = df.fillna(0)
df
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.