简体   繁体   中英

Create dataframe pandas from dict where values are list of tuples and each column name is unique

I have two lists that I use to create a dictionary, where list1 has text data and list2 is a list of tuples (text, float). I use these 2 lists to create a dictionary and the goal is to create a dataframe where each row of the first column will contain the elements of list1, each column will have a column name based on each unique text term from the first tuple element and for each row there will be the float values that connect them.

For example here's the dictionary with keys: {be, associate, induce, represent} and values: {('prove', 0.583171546459198), ('serve', 0.4951282739639282)} etc.

{'be': [('prove', 0.583171546459198), ('serve', 0.4951282739639282), ('render', 0.4826732873916626), ('represent', 0.47748714685440063), ('lead', 0.47725602984428406), ('replace', 0.4695377051830292), ('contribute', 0.4529820680618286)], 
'associate': [('interact', 0.8237789273262024), ('colocalize', 0.6831706762313843)], 
'induce': [('suppress', 0.8159114718437195), ('provoke', 0.7866303324699402), ('elicit', 0.7509980201721191), ('inhibit', 0.7498961687088013), ('potentiate', 0.742023229598999), ('produce', 0.7384929656982422), ('attenuate', 0.7352016568183899), ('abrogate', 0.7260081768035889), ('trigger', 0.717864990234375), ('stimulate', 0.7136563658714294)], 
'represent': [('prove', 0.6612186431884766), ('evoke', 0.6591314673423767), ('up-regulate', 0.6582908034324646), ('synergize', 0.6541063785552979), ('activate', 0.6512928009033203), ('mediate', 0.6494284272193909)]}

Desired Output

               prove    serve  render    represent  
    be          0.58     0.49   0.48        0.47             
    associate   0        0      0           0       
    induce      0.45     0.58   0.9         0.7
    represent   0.66     0      0           1       

So what tricks me is that the verb prove can be found in more than one keys (ie for the key be , the score is 0.58 and for the key represent the score is 0.66).

If I use df = pd.DataFrame.from_dict(d,orient='index') , then the verb prove will appear twice as a column name, whereas I want each term to appear once in each column.

Can someone help?

With the dictionary that you provided (as d ), you can't use from_dict directly.

You either need to rework the dictionary to have elements as dictionaries:

pd.DataFrame.from_dict({k: dict(v) for k,v in d.items()}, orient='index')

Or you need to read it as a Series and to reshape:

(pd.Series(d).explode()
   .apply(pd.Series)
   .set_index(0, append=True)[1]
   .unstack(fill_value=0)
)

output:

              prove     serve    render  represent      lead   replace  \
be         0.583172  0.495128  0.482673   0.477487  0.477256  0.469538   
represent  0.661219       NaN       NaN        NaN       NaN       NaN   
associate       NaN       NaN       NaN        NaN       NaN       NaN   
induce          NaN       NaN       NaN        NaN       NaN       NaN   

           contribute  interact  colocalize  suppress  ...   produce  \
be           0.452982       NaN         NaN       NaN  ...       NaN   
represent         NaN       NaN         NaN       NaN  ...       NaN   
associate         NaN  0.823779    0.683171       NaN  ...       NaN   
induce            NaN       NaN         NaN  0.815911  ...  0.738493   

           attenuate  abrogate   trigger  stimulate     evoke  up-regulate  \
be               NaN       NaN       NaN        NaN       NaN          NaN   
represent        NaN       NaN       NaN        NaN  0.659131     0.658291   
associate        NaN       NaN       NaN        NaN       NaN          NaN   
induce      0.735202  0.726008  0.717865   0.713656       NaN          NaN   

           synergize  activate   mediate  
be               NaN       NaN       NaN  
represent   0.654106  0.651293  0.649428  
associate        NaN       NaN       NaN  
induce           NaN       NaN       NaN  

[4 rows x 24 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM