[英]convert a dictionary with list of tuples to dataframe
我有一本這樣的字典:
pred_dict = {('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235), ('Ferrari', 0.5013809), ('african zebra', 0.49264234), ('ara
...: bian horse', 0.5186422), ('bobcat', 0.5096679)], ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102), ('Ferrari', 0.502793), ('afric
...: an zebra', 0.48751196), ('arabian horse', 0.49272105), ('bobcat', 0.5228181)]}
想要轉換為這樣的數據框:
Text | Blue Whale | Ferrari | african zebra| arabian horse | bobcat |
('african zebra', 'arabian horse') 0.49859235 0.5013809 0.49264234 0.5186422 0.5096679
('cheetah', 'mountain lion') 0.48881102 0.502793 0.48751196 0.49272105 0.5228181
給定字典中的每個值都具有完全相同的元組數,並且元組列表中的第一個值相同。 要做的是將字典的鍵放在“文本”列中,然后將元組中的第一個值設為其他列名。 值將是分數-浮點數。
任何的意見都將會有幫助。 這是我現在正在嘗試的一些東西:
In [12]: text = list(pred_dict.keys())
In [13]: values = list(pred_dict.values())
In [14]: pred_df = pd.DataFrame({'text': text, 'label_scores': values})
In [15]: pred_df
Out[15]:
text label_scores
0 (african zebra, arabian horse) [(Blue Whale, 0.49859235), (Ferrari, 0.5013809...
1 (cheetah, mountain lion) [(Blue Whale, 0.48881102), (Ferrari, 0.502793)...
In [19]: df_scores = pred_df['label_scores']
In [21]: df_scores
Out[21]:
0 [(Blue Whale, 0.49859235), (Ferrari, 0.5013809...
1 [(Blue Whale, 0.48881102), (Ferrari, 0.502793)...
Name: label_scores, dtype: object
In [22]: labels = [t[1] for t in df_scores[0]]
In [23]: labels
Out[23]: [0.49859235, 0.5013809, 0.49264234, 0.5186422, 0.5096679]
In [24]: labels = [t[0] for t in df_scores[0]]
In [25]: labels
Out[25]: ['Blue Whale', 'Ferrari', 'african zebra', 'arabian horse', 'bobcat']
In [26]: scores = [t[1] for t in df_scores[0]]
In [27]: scores
Out[27]: [0.49859235, 0.5013809, 0.49264234, 0.5186422, 0.5096679]
In [28]: scores = [t[1] for t in df_scores[1]]
In [29]: scores
Out[29]: [0.48881102, 0.502793, 0.48751196, 0.49272105, 0.5228181]
pred_dict = {('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235), ('Ferrari', 0.5013809), ('african zebra', 0.49264234), ('arabian horse', 0.5186422), ('bobcat', 0.5096679)], ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102), ('Ferrari', 0.502793), ('african zebra', 0.48751196), ('arabian horse', 0.49272105), ('bobcat', 0.5228181)]}
應該這樣做:
pd.concat([pd.DataFrame(r,columns=['Text','value'],index=[t]*len(r)) for (t, r) in pred_dict.items()]).set_index('Text',append=True).unstack('Text')['value']
它不是很漂亮,但是可以工作:
pred_dict = {
('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235),
('Ferrari', 0.5013809),
('african zebra', 0.49264234),
('arabian horse', 0.5186422),
('bobcat', 0.5096679)],
('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102),
('Ferrari', 0.502793),
('african zebra', 0.48751196),
('arabian horse', 0.49272105),
('bobcat', 0.5228181)]
}
df = pd.DataFrame(pred_dict).T
df.columns = [tuple[0] for tuple in list(df.iloc[0])]
df = df.apply(lambda x: [tuple[1] for tuple in x])
df.reset_index(inplace=True)
df.insert(0, "Text", list(zip(df.level_0, df.level_1)))
df.drop(["level_0", "level_1"], axis=1, inplace=True)
輸出為:
Text Blue Whale ... arabian horse bobcat
0 (african zebra, arabian horse) 0.498592 ... 0.518642 0.509668
1 (cheetah, mountain lion) 0.488811 ... 0.492721 0.522818
好。 經過一番試驗,才得以做到。 這是我的做法:
text = list(pred_dict.keys())
values = list(pred_dict.values())
df_1 = pd.DataFrame({'text': text})
score_dict = {}
for label in mlb_classes:
score_list = []
for t_list in values:
for t in t_list:
if t[0] == label:
score_list.append(t[1])
score_dict[label] = score_list
df_2 = pd.DataFrame(score_dict)
score_df = pd.concat([df_1, df_2], axis=1)
print(score_df)
輸出:
text Blue Whale Ferrari african zebra arabian horse bobcat
0 (african zebra, arabian horse) 0.519343 0.511951 0.512639 0.527919 0.491461 0.516240
1 (cheetah, mountain lion) 0.495197 0.527627 0.497516 0.512571 0.488823 0.510277
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.