[英]Manipulate Python Dictionary into Pandas Dataframe
I have a word vector object from gensim's word2vec package and can access the 'username' using model.wv.vocab
and vectors using model.wv[w]
. 我有一个来自gensim的word2vec包的单词向量对象,可以使用
model.wv.vocab
访问“用户名”,并使用model.wv[w]
访问向量。
Here's a sample of what I'm working with 这是我正在使用的样品
for w in sample:
print("ID:", w)
print("Vector subset: \n", model.wv[w][:10])
ID: 1843
Vector subset:
[ 0.08228672 -0.32398582 -0.16024925 0.44939137 -0.28749713 0.25965428
-0.18141621 0.06290377 0.1270649 0.40421844]
ID: 866
Vector subset:
[-0.21120088 0.10489845 0.17965898 0.18383555 -0.24510185 -0.00716993
-0.18718664 0.3398481 0.07536748 -0.5193063 ]
ID: 2819
Vector subset:
[ 0.33056906 0.20122662 0.0239714 0.1846028 -0.1632814 -0.4005747
-0.02339112 0.22077617 0.20608544 -0.12747312]
ID: 4091
Vector subset:
[ 0.5139592 0.1325652 -0.19846869 0.02061795 -0.72117347 -0.5065503
-0.2806759 0.13045706 0.5880965 -0.497771 ]
ID: 4871
Vector subset:
[-0.30731577 0.10253543 0.01026379 0.24779265 0.3701798 -0.16493073
0.07395677 -0.4943776 0.02144529 -0.12544158]
ID: 6557
Vector subset:
[-0.01380698 0.03429209 0.11136885 0.10298727 -0.09034968 -0.09744099
0.04731373 0.12851992 0.5266305 -0.14707205]
ID: 4691
Vector subset:
[-0.12838683 0.34491533 0.10016204 -0.00582217 -0.1514073 0.13864768
0.05341618 -0.15653287 0.37432986 0.09268643]
ID: 409
Vector subset:
[ 0.01493216 0.06893755 0.10319904 -0.08454162 -0.08191169 -0.16257484
-0.10028194 -0.02943738 0.3722616 -0.27091444]
ID: 8229
Vector subset:
[-0.72491664 0.28790048 0.04535258 0.57867676 -0.09895556 -0.01902669
-0.03930351 0.551734 -0.2825539 0.1426454 ]
ID: 5222
Vector subset:
[-0.05142907 -0.3080357 -0.00205866 -0.02018788 -0.07856932 -0.46743438
-0.29095295 0.44115666 0.34238762 0.2151215 ]
I need to manipulate this information into a form that looks like the dataframe below to pass into a script: 我需要将此信息处理为类似于以下数据框的形式,以传递给脚本:
username 1 2 3 4 5 6
00 0.023 0.232 -0.13 0.2424 -0.242 -0.22
01 0.001 0.013 -0.232 0.3232 0.2324 -0.023234
02 0.244 -0.24 -0.3555 0.444 -0.22 -0.2342
03 0.5333 -0.99 -0.9242 -0.43 0.242 0.423
My current idea was to create a dictionary of usernames & transposed vectors and then create a dataframe from the dictionary. 我当前的想法是创建一个用户名和转置向量的字典,然后从该字典创建一个数据框。
vect_dict = {}
for w in model.wv.vocab:
reshaped_vec = np.reshape(model.wv[w], (300, 1)).T
vect_dict[w] = reshaped_vec
However, this won't give me a separate column for usernames and the row as the transposed vectors with each column being an ith index into the vector. 但是,这不会给我单独的用户名列和行作为转置向量,而每一列都是向量的第i个索引。
How can I manipulate my given data into this form? 如何将给定的数据处理为这种形式?
Thank you! 谢谢!
You can transpose dataframes, which might make this simpler. 您可以转置数据帧,这可能会使此过程更简单。 I forget if
model.wv
supports simply being treated as a dictionary, but even if not the following will work: 我忘记了
model.wv
是否model.wv
支持被视为字典,但是即使不能,也可以使用以下方法:
vect_dict = {w: model.wv[w] for w in model.wv.vocab}
dataframe = pd.DataFrame(vect_dict).T
This looks like the following: 如下所示:
In [1]: pd.DataFrame({'a': [1,2,3], 'b': [2,3,4]}).T
Out[1]:
0 1 2
a 1 2 3
b 2 3 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.