简体   繁体   English

从 dict 创建 dataframe pandas ,其中值是元组列表,每个列名都是唯一的

[英]Create dataframe pandas from dict where values are list of tuples and each column name is unique

I have two lists that I use to create a dictionary, where list1 has text data and list2 is a list of tuples (text, float).我有两个用于创建字典的列表,其中list1包含文本数据,而list2是元组列表(文本、浮点数)。 I use these 2 lists to create a dictionary and the goal is to create a dataframe where each row of the first column will contain the elements of list1, each column will have a column name based on each unique text term from the first tuple element and for each row there will be the float values that connect them.我使用这两个列表来创建一个字典,目标是创建一个 dataframe ,其中第一列的每一行将包含 list1 的元素,每列将具有基于第一个元组元素中每个唯一文本术语的列名和对于每一行,都会有连接它们的浮点值。

For example here's the dictionary with keys: {be, associate, induce, represent} and values: {('prove', 0.583171546459198), ('serve', 0.4951282739639282)} etc.例如,这是带有keys: {be, associate, induce, represent}values: {('prove', 0.583171546459198), ('serve', 0.4951282739639282)}等。

{'be': [('prove', 0.583171546459198), ('serve', 0.4951282739639282), ('render', 0.4826732873916626), ('represent', 0.47748714685440063), ('lead', 0.47725602984428406), ('replace', 0.4695377051830292), ('contribute', 0.4529820680618286)], 
'associate': [('interact', 0.8237789273262024), ('colocalize', 0.6831706762313843)], 
'induce': [('suppress', 0.8159114718437195), ('provoke', 0.7866303324699402), ('elicit', 0.7509980201721191), ('inhibit', 0.7498961687088013), ('potentiate', 0.742023229598999), ('produce', 0.7384929656982422), ('attenuate', 0.7352016568183899), ('abrogate', 0.7260081768035889), ('trigger', 0.717864990234375), ('stimulate', 0.7136563658714294)], 
'represent': [('prove', 0.6612186431884766), ('evoke', 0.6591314673423767), ('up-regulate', 0.6582908034324646), ('synergize', 0.6541063785552979), ('activate', 0.6512928009033203), ('mediate', 0.6494284272193909)]}

Desired Output所需 Output

               prove    serve  render    represent  
    be          0.58     0.49   0.48        0.47             
    associate   0        0      0           0       
    induce      0.45     0.58   0.9         0.7
    represent   0.66     0      0           1       

So what tricks me is that the verb prove can be found in more than one keys (ie for the key be , the score is 0.58 and for the key represent the score is 0.66).所以让我着迷的是动词prove可以在多个键中找到(即键be ,分数是0.58,键represent分数是0.66)。

If I use df = pd.DataFrame.from_dict(d,orient='index') , then the verb prove will appear twice as a column name, whereas I want each term to appear once in each column.如果我使用df = pd.DataFrame.from_dict(d,orient='index') ,那么动词证明将作为列名出现两次,而我希望每个术语在每列中出现一次。

Can someone help?有人可以帮忙吗?

With the dictionary that you provided (as d ), you can't use from_dict directly.使用您提供的字典(作为d ),您不能直接使用from_dict

You either need to rework the dictionary to have elements as dictionaries:您要么需要修改字典以将元素作为字典:

pd.DataFrame.from_dict({k: dict(v) for k,v in d.items()}, orient='index')

Or you need to read it as a Series and to reshape:或者您需要将其作为一个系列阅读并重塑:

(pd.Series(d).explode()
   .apply(pd.Series)
   .set_index(0, append=True)[1]
   .unstack(fill_value=0)
)

output: output:

              prove     serve    render  represent      lead   replace  \
be         0.583172  0.495128  0.482673   0.477487  0.477256  0.469538   
represent  0.661219       NaN       NaN        NaN       NaN       NaN   
associate       NaN       NaN       NaN        NaN       NaN       NaN   
induce          NaN       NaN       NaN        NaN       NaN       NaN   

           contribute  interact  colocalize  suppress  ...   produce  \
be           0.452982       NaN         NaN       NaN  ...       NaN   
represent         NaN       NaN         NaN       NaN  ...       NaN   
associate         NaN  0.823779    0.683171       NaN  ...       NaN   
induce            NaN       NaN         NaN  0.815911  ...  0.738493   

           attenuate  abrogate   trigger  stimulate     evoke  up-regulate  \
be               NaN       NaN       NaN        NaN       NaN          NaN   
represent        NaN       NaN       NaN        NaN  0.659131     0.658291   
associate        NaN       NaN       NaN        NaN       NaN          NaN   
induce      0.735202  0.726008  0.717865   0.713656       NaN          NaN   

           synergize  activate   mediate  
be               NaN       NaN       NaN  
represent   0.654106  0.651293  0.649428  
associate        NaN       NaN       NaN  
induce           NaN       NaN       NaN  

[4 rows x 24 columns]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM