简体   繁体   English

Gensim Word2Vec model 浮点数

[英]Gensim Word2Vec model floating point

I have trained a word2vec model using gensim.我已经使用 gensim 训练了 word2vec model。 In the models matrix some values' floating point looks like this: "-7.18556e-05"在模型矩阵中,一些值的浮点数如下所示:“-7.18556e-05”

I need to use the values on the matrix as a string.我需要将矩阵上的值用作字符串。 Is there a way to remove those "e-05","e-04" etc.?有没有办法删除那些“e-05”、“e-04”等?

import nltk
from gensim.models import Word2Vec
from nltk.corpus import stopwords

text = "My text is here"
sentences = nltk.sent_tokenize(text)
for i in range(len(sentences)):
    sentences[i] = [word for word in sentences[i] if word not in stopwords.words('english')]

model = Word2Vec(sentences, min_count=1)

words = model.wv.vocab

for word in words:
    matrix = model.wv[words.keys()]

Note that those scientific-notation printouts are valid strings, & will be understood by Python & many reading routines that might be used on your output.请注意,这些科学记数法打印输出是有效字符串,并且 Python 和许多可能在 output 上使用的读取例程都可以理解。

And, when printing for some very specific purpose, there are various formatting options (including the .format() options mentioned by comments) to get exactly what you need.而且,当为某些非常特定的目的打印时,有各种格式选项(包括注释中提到的.format()选项)可以准确地获得您需要的内容。 (You haven't shown what methods of triggering matrix/array display you're currently using, so it's not clear what suggestions for altering the display, at the key output points, are best.) (您还没有展示您当前使用的触发矩阵/阵列显示的方法,因此不清楚在关键 output 点上更改显示的建议是最好的。)

But also: all the vectors/matrixes from gensim and most similar libraries are typically provided by numpy , which has a global setting to alter display options, including a suppress parameter for completely stopping such notation.而且:来自gensim和大多数类似库的所有向量/矩阵通常由numpy提供,它具有更改显示选项的全局设置,包括用于完全停止此类符号的suppress参数。 See this other answer for more details:有关更多详细信息,请参见其他答案:

https://stackoverflow.com/a/2891805/130288 https://stackoverflow.com/a/2891805/130288

Ultimately, you may not want to rely on this being set, at some prior time & globally, to get your desired output at one specific intentional place.最终,您可能不想依赖此设置,在某个特定时间和全球范围内,在一个特定的有意位置获得您想要的 output。 It'd be clearer, more robust code to explicitly format the results for the purpose.为此目的明确格式化结果会更清晰,更健壮的代码。 But as a quick fix, the above may fit your need.但作为一种快速解决方法,以上内容可能适合您的需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM