[英]Create dataframe pandas from dict where values are list of tuples and each column name is unique
I have two lists that I use to create a dictionary, where list1 has text data and list2 is a list of tuples (text, float).我有两个用于创建字典的列表,其中list1包含文本数据,而list2是元组列表(文本、浮点数)。 I use these 2 lists to create a dictionary and the goal is to create a dataframe where each row of the first column will contain the elements of list1, each column will have a column name based on each unique text term from the first tuple element and for each row there will be the float values that connect them.我使用这两个列表来创建一个字典,目标是创建一个 dataframe ,其中第一列的每一行将包含 list1 的元素,每列将具有基于第一个元组元素中每个唯一文本术语的列名和对于每一行,都会有连接它们的浮点值。
For example here's the dictionary with keys: {be, associate, induce, represent}
and values: {('prove', 0.583171546459198), ('serve', 0.4951282739639282)}
etc.例如,这是带有keys: {be, associate, induce, represent}
和values: {('prove', 0.583171546459198), ('serve', 0.4951282739639282)}
等。
{'be': [('prove', 0.583171546459198), ('serve', 0.4951282739639282), ('render', 0.4826732873916626), ('represent', 0.47748714685440063), ('lead', 0.47725602984428406), ('replace', 0.4695377051830292), ('contribute', 0.4529820680618286)],
'associate': [('interact', 0.8237789273262024), ('colocalize', 0.6831706762313843)],
'induce': [('suppress', 0.8159114718437195), ('provoke', 0.7866303324699402), ('elicit', 0.7509980201721191), ('inhibit', 0.7498961687088013), ('potentiate', 0.742023229598999), ('produce', 0.7384929656982422), ('attenuate', 0.7352016568183899), ('abrogate', 0.7260081768035889), ('trigger', 0.717864990234375), ('stimulate', 0.7136563658714294)],
'represent': [('prove', 0.6612186431884766), ('evoke', 0.6591314673423767), ('up-regulate', 0.6582908034324646), ('synergize', 0.6541063785552979), ('activate', 0.6512928009033203), ('mediate', 0.6494284272193909)]}
Desired Output所需 Output
prove serve render represent
be 0.58 0.49 0.48 0.47
associate 0 0 0 0
induce 0.45 0.58 0.9 0.7
represent 0.66 0 0 1
So what tricks me is that the verb prove
can be found in more than one keys (ie for the key be
, the score is 0.58 and for the key represent
the score is 0.66).所以让我着迷的是动词prove
可以在多个键中找到(即键be
,分数是0.58,键represent
分数是0.66)。
If I use df = pd.DataFrame.from_dict(d,orient='index')
, then the verb prove will appear twice as a column name, whereas I want each term to appear once in each column.如果我使用df = pd.DataFrame.from_dict(d,orient='index')
,那么动词证明将作为列名出现两次,而我希望每个术语在每列中出现一次。
Can someone help?有人可以帮忙吗?
With the dictionary that you provided (as d
), you can't use from_dict
directly.使用您提供的字典(作为d
),您不能直接使用from_dict
。
You either need to rework the dictionary to have elements as dictionaries:您要么需要修改字典以将元素作为字典:
pd.DataFrame.from_dict({k: dict(v) for k,v in d.items()}, orient='index')
Or you need to read it as a Series and to reshape:或者您需要将其作为一个系列阅读并重塑:
(pd.Series(d).explode()
.apply(pd.Series)
.set_index(0, append=True)[1]
.unstack(fill_value=0)
)
output: output:
prove serve render represent lead replace \
be 0.583172 0.495128 0.482673 0.477487 0.477256 0.469538
represent 0.661219 NaN NaN NaN NaN NaN
associate NaN NaN NaN NaN NaN NaN
induce NaN NaN NaN NaN NaN NaN
contribute interact colocalize suppress ... produce \
be 0.452982 NaN NaN NaN ... NaN
represent NaN NaN NaN NaN ... NaN
associate NaN 0.823779 0.683171 NaN ... NaN
induce NaN NaN NaN 0.815911 ... 0.738493
attenuate abrogate trigger stimulate evoke up-regulate \
be NaN NaN NaN NaN NaN NaN
represent NaN NaN NaN NaN 0.659131 0.658291
associate NaN NaN NaN NaN NaN NaN
induce 0.735202 0.726008 0.717865 0.713656 NaN NaN
synergize activate mediate
be NaN NaN NaN
represent 0.654106 0.651293 0.649428
associate NaN NaN NaN
induce NaN NaN NaN
[4 rows x 24 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.