[英]gensim word2vec: Find number of words in vocabulary
The vocabulary is in the vocab
field of the Word2Vec model's wv
property, as a dictionary, with the keys being each token (word). 词汇表在Word2Vec模型的
wv
属性的vocab
字段中,作为字典,其中键是每个标记(单词)。 So it's just the usual Python for getting a dictionary's length: 所以它只是通常的Python获取字典的长度:
len(w2v_model.wv.vocab)
(In older gensim versions before 0.13, vocab
appeared directly on the model. So you would use w2v_model.vocab
instead of w2v_model.wv.vocab
.) (在0.13之前的较旧gensim版本中,
vocab
直接出现在模型上。因此,您将使用w2v_model.vocab
而不是w2v_model.wv.vocab
。)
Gojomo's answer raises an AttributeError
for Gensim 4.0.0+. Gojomo 的回答为 Gensim 4.0.0+ 引发了一个
AttributeError
。
For these versions, you can get the length of the vocabulary as follows:对于这些版本,您可以按如下方式获取词汇表的长度:
len(w2v_model.wv.index_to_key)
(which is slightly faster than: len(w2v_model.wv.key_to_index)
) (略快于:
len(w2v_model.wv.key_to_index)
)
One more way to get the vocabulary size is from the embedding matrix itself as in: 获取词汇量大小的另一种方法是嵌入矩阵本身,如:
In [33]: from gensim.models import Word2Vec
# load the pretrained model
In [34]: model = Word2Vec.load(pretrained_model)
# get the shape of embedding matrix
In [35]: model.wv.vectors.shape
Out[35]: (662109, 300)
# `vocabulary_size` is just the number of rows (i.e. axis 0)
In [36]: model.wv.vectors.shape[0]
Out[36]: 662109
Latest:最新的:
Use model.wv.key_to_index, after creating gensim model使用model.wv.key_to_index,创建gensim后model
vocab dict became key_to_index for looking up a key's integer index, or get_vecattr() and set_vecattr() for other per-key attributes: https://github.com/RaRe-Technologies/gensim/wiki/Migrating-from-Gensim-3.x-to-4#4-vocab-dict-became-key_to_index-for-looking-up-a-keys-integer-index-or-get_vecattr-and-set_vecattr-for-other-per-key-attributes vocab dict 成为 key_to_index 用于查找键的 integer 索引,或 get_vecattr() 和 set_vecattr() 用于其他每个键的属性: https://github.com/RaRe-Technologies/gensim/wiki/Migrating .x-to-4#4-vocab-dict-became-key_to_index-for-looking-up-a-keys-integer-index-or-get_vecattr-and-set_vecattr-for-other-per-key-attributes
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.