简体   繁体   English

将Python字典处理为Pandas Dataframe

[英]Manipulate Python Dictionary into Pandas Dataframe

I have a word vector object from gensim's word2vec package and can access the 'username' using model.wv.vocab and vectors using model.wv[w] . 我有一个来自gensim的word2vec包的单词向量对象,可以使用model.wv.vocab访问“用户名”,并使用model.wv[w]访问向量。

Here's a sample of what I'm working with 这是我正在使用的样品

for w in sample:
    print("ID:", w)
    print("Vector subset: \n", model.wv[w][:10])

ID: 1843
Vector subset: 
 [ 0.08228672 -0.32398582 -0.16024925  0.44939137 -0.28749713  0.25965428
 -0.18141621  0.06290377  0.1270649   0.40421844]
ID: 866
Vector subset: 
 [-0.21120088  0.10489845  0.17965898  0.18383555 -0.24510185 -0.00716993
 -0.18718664  0.3398481   0.07536748 -0.5193063 ]
ID: 2819
Vector subset: 
 [ 0.33056906  0.20122662  0.0239714   0.1846028  -0.1632814  -0.4005747
 -0.02339112  0.22077617  0.20608544 -0.12747312]
ID: 4091
Vector subset: 
 [ 0.5139592   0.1325652  -0.19846869  0.02061795 -0.72117347 -0.5065503
 -0.2806759   0.13045706  0.5880965  -0.497771  ]
ID: 4871
Vector subset: 
 [-0.30731577  0.10253543  0.01026379  0.24779265  0.3701798  -0.16493073
  0.07395677 -0.4943776   0.02144529 -0.12544158]
ID: 6557
Vector subset: 
 [-0.01380698  0.03429209  0.11136885  0.10298727 -0.09034968 -0.09744099
  0.04731373  0.12851992  0.5266305  -0.14707205]
ID: 4691
Vector subset: 
 [-0.12838683  0.34491533  0.10016204 -0.00582217 -0.1514073   0.13864768
  0.05341618 -0.15653287  0.37432986  0.09268643]
ID: 409
Vector subset: 
 [ 0.01493216  0.06893755  0.10319904 -0.08454162 -0.08191169 -0.16257484
 -0.10028194 -0.02943738  0.3722616  -0.27091444]
ID: 8229
Vector subset: 
 [-0.72491664  0.28790048  0.04535258  0.57867676 -0.09895556 -0.01902669
 -0.03930351  0.551734   -0.2825539   0.1426454 ]
ID: 5222
Vector subset: 
 [-0.05142907 -0.3080357  -0.00205866 -0.02018788 -0.07856932 -0.46743438
 -0.29095295  0.44115666  0.34238762  0.2151215 ]

I need to manipulate this information into a form that looks like the dataframe below to pass into a script: 我需要将此信息处理为类似于以下数据框的形式,以传递给脚本:

    username        1       2       3       4       5   6
          00    0.023   0.232   -0.13   0.2424  -0.242  -0.22
          01    0.001   0.013   -0.232  0.3232  0.2324  -0.023234
          02    0.244   -0.24   -0.3555 0.444   -0.22   -0.2342
          03    0.5333  -0.99   -0.9242 -0.43   0.242   0.423

My current idea was to create a dictionary of usernames & transposed vectors and then create a dataframe from the dictionary. 我当前的想法是创建一个用户名和转置向量的字典,然后从该字典创建一个数据框。

vect_dict = {}
for w in model.wv.vocab:
    reshaped_vec = np.reshape(model.wv[w], (300, 1)).T
    vect_dict[w] = reshaped_vec

However, this won't give me a separate column for usernames and the row as the transposed vectors with each column being an ith index into the vector. 但是,这不会给我单独的用户名列和行作为转置向量,而每一列都是向量的第i个索引。

How can I manipulate my given data into this form? 如何将给定的数据处理为这种形式?

Thank you! 谢谢!

You can transpose dataframes, which might make this simpler. 您可以转置数据帧,这可能会使此过程更简单。 I forget if model.wv supports simply being treated as a dictionary, but even if not the following will work: 我忘记了model.wv是否model.wv支持被视为字典,但是即使不能,也可以使用以下方法:

vect_dict = {w: model.wv[w] for w in model.wv.vocab}
dataframe = pd.DataFrame(vect_dict).T

This looks like the following: 如下所示:

In [1]: pd.DataFrame({'a': [1,2,3], 'b': [2,3,4]}).T
Out[1]:
   0  1  2
a  1  2  3
b  2  3  4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM