简体   繁体   中英

convert a dictionary with list of tuples to dataframe

I have a dictionary like this:

pred_dict = {('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235), ('Ferrari', 0.5013809), ('african zebra', 0.49264234), ('ara
    ...: bian horse', 0.5186422), ('bobcat', 0.5096679)], ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102), ('Ferrari', 0.502793), ('afric
    ...: an zebra', 0.48751196), ('arabian horse', 0.49272105), ('bobcat', 0.5228181)]}

Would like to convert to a dataframe like this:

Text | Blue Whale | Ferrari | african zebra| arabian horse | bobcat | 
('african zebra', 'arabian horse') 0.49859235 0.5013809 0.49264234 0.5186422 0.5096679
('cheetah', 'mountain lion') 0.48881102 0.502793 0.48751196 0.49272105 0.5228181

The each value in the given dictionary has the exact same number of tuples with identical first-values in the tuple list. What is to be done is to place the keys of the dict in the 'text' column, and then, have the first values in the tuples as other column-names. The values will be the scores - floats.

Any suggestions would be helpful. here is some stuff I am trying now:

In [12]: text = list(pred_dict.keys())

In [13]: values = list(pred_dict.values())

In [14]: pred_df = pd.DataFrame({'text': text, 'label_scores': values})

In [15]: pred_df
Out[15]:
                             text                                       label_scores
0  (african zebra, arabian horse)  [(Blue Whale, 0.49859235), (Ferrari, 0.5013809...
1        (cheetah, mountain lion)  [(Blue Whale, 0.48881102), (Ferrari, 0.502793)...

In [19]: df_scores = pred_df['label_scores']
In [21]: df_scores
Out[21]:
0    [(Blue Whale, 0.49859235), (Ferrari, 0.5013809...
1    [(Blue Whale, 0.48881102), (Ferrari, 0.502793)...
Name: label_scores, dtype: object

In [22]: labels = [t[1] for t in df_scores[0]]

In [23]: labels
Out[23]: [0.49859235, 0.5013809, 0.49264234, 0.5186422, 0.5096679]

In [24]: labels = [t[0] for t in df_scores[0]]

In [25]: labels
Out[25]: ['Blue Whale', 'Ferrari', 'african zebra', 'arabian horse', 'bobcat']

In [26]: scores = [t[1] for t in df_scores[0]]

In [27]: scores
Out[27]: [0.49859235, 0.5013809, 0.49264234, 0.5186422, 0.5096679]

In [28]: scores = [t[1] for t in df_scores[1]]

In [29]: scores
Out[29]: [0.48881102, 0.502793, 0.48751196, 0.49272105, 0.5228181]
pred_dict = {('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235), ('Ferrari', 0.5013809), ('african zebra', 0.49264234), ('arabian horse', 0.5186422), ('bobcat', 0.5096679)], ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102), ('Ferrari', 0.502793), ('african zebra', 0.48751196), ('arabian horse', 0.49272105), ('bobcat', 0.5228181)]}

This should do it:

pd.concat([pd.DataFrame(r,columns=['Text','value'],index=[t]*len(r)) for (t, r) in pred_dict.items()]).set_index('Text',append=True).unstack('Text')['value']

Produces this: 在此处输入图片说明

It's not pretty but it works:

pred_dict = {
    ('african zebra', 'arabian horse'): [('Blue Whale', 0.49859235),
                                         ('Ferrari', 0.5013809),
                                         ('african zebra', 0.49264234),
                                         ('arabian horse', 0.5186422),
                                         ('bobcat', 0.5096679)],
    ('cheetah', 'mountain lion'): [('Blue Whale', 0.48881102),
                                   ('Ferrari', 0.502793),
                                   ('african zebra', 0.48751196),
                                   ('arabian horse', 0.49272105),
                                   ('bobcat', 0.5228181)]
}

df = pd.DataFrame(pred_dict).T
df.columns = [tuple[0] for tuple in list(df.iloc[0])]
df = df.apply(lambda x:  [tuple[1] for tuple in x])
df.reset_index(inplace=True)
df.insert(0, "Text", list(zip(df.level_0, df.level_1)))
df.drop(["level_0", "level_1"], axis=1, inplace=True)

The output of which is:

                             Text  Blue Whale  ...  arabian horse    bobcat
0  (african zebra, arabian horse)    0.498592  ...       0.518642  0.509668
1        (cheetah, mountain lion)    0.488811  ...       0.492721  0.522818

ok. After some trials, was able to do it. here is how I did it:

text = list(pred_dict.keys())
values = list(pred_dict.values())
df_1 = pd.DataFrame({'text': text})

score_dict = {}
for label in mlb_classes:
     score_list = []
     for t_list in values:
         for t in t_list:
             if t[0] == label:
                 score_list.append(t[1])
     score_dict[label] = score_list

df_2 = pd.DataFrame(score_dict)

score_df = pd.concat([df_1, df_2], axis=1)
print(score_df)

Output:

     text  Blue Whale   Ferrari  african zebra  arabian horse    bobcat    
0  (african zebra, arabian horse)    0.519343  0.511951       0.512639       0.527919  0.491461  0.516240  
1        (cheetah, mountain lion)    0.495197  0.527627       0.497516       0.512571  0.488823  0.510277  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM