简体繁体 English

word2vec：训练语料库中的顺序排列

[英]word2vec: Order of setences in the training corpus

原文 2016-04-07 11:45:46 6 1 java/ word2vec

I have a question concerning the word2vec algorithm. 我有一个关于word2vec算法的问题。 In fact, my question is if the order of the sentences in the training corpus is important. 实际上，我的问题是训练语料库中句子的顺序是否重要。 For example, given two training corpus: 例如，给定两个训练语料库：

CorpusA: Sentence 1. Sentence 2. Sentence 3. 语料库A：句子1。句子2。句子3。

CorpusB: Sentence 3. Sentence 1. Sentence 2. 语料库B：句子3。句子1。句子2。

Will the results from word2vec be different? word2vec的结果会有所不同吗？

Thanks in advance 提前致谢

1 个解决方案

Order of sentences would impact the embedding learnt from the text corpus since most word2vec implementations are trained using SGD. 句子的顺序会影响从文本语料库学到的嵌入，因为大多数word2vec实现都是使用SGD进行训练的。

So answer to your question - yes, results of word2vec be different. 因此，回答您的问题-是的，word2vec的结果有所不同。

I don't think word2vec is the right algorithm to use if order of sentences in the corpus is important to you. 如果语料库中的句子顺序对您来说很重要，我认为word2vec不是正确的算法。 Keep in mind, output of word can vary because of multiple reasons, few of which are - 请记住，由于多种原因，单词的输出可能会有所不同，其中很少有-

random initialisation of vectors 向量的随机初始化
negative sampling 负采样
multi-threading 多线程
floating-point precision of your machine 机器的浮点精度

For better results, we do multiple epochs over the training data which won't be possible in your case 为了获得更好的结果，我们对训练数据进行了多个时期的处理，这在您的情况下是不可能的

word2vec是否可以与带有两种语言的文本的语料库一起正常工作？ - Does word2vec work fine with a corpus with text in two languages?

Word2Vec 中文 - Word2Vec with chinese

如何使用word2vec？ - How to use word2vec?

使用 Apache Spark 2.0.0 和 mllib 进行分布式 Word2Vec 模型训练 - Distributed Word2Vec Model Training using Apache Spark 2.0.0 and mllib

将Spark Word2Vec矢量倾销到文件中 - Dumping spark word2vec vectors to a file

将 word2vec 模型加载到 Mysql 数据库中 - loading a word2vec model into a Mysql database

Scala/Java word2vec 阅读器 - Scala / Java word2vec reader

如何在 Java 中实现 Word2Vec？ - How to implement Word2Vec in Java?

Deeplearning4j Word2Vec 构建器种子 - Deeplearning4j Word2Vec builder seed

H2O AI：不支持 MOJO model 'word2vec' - H2O AI : Unsupported MOJO model 'word2vec'

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 word2vec是否可以与带有两种语言的文本的语料库一起正常工作？ - Does word2vec work fine with a corpus with text in two languages? Word2Vec 中文 - Word2Vec with chinese 如何使用word2vec？ - How to use word2vec? 使用 Apache Spark 2.0.0 和 mllib 进行分布式 Word2Vec 模型训练 - Distributed Word2Vec Model Training using Apache Spark 2.0.0 and mllib 将Spark Word2Vec矢量倾销到文件中 - Dumping spark word2vec vectors to a file 将 word2vec 模型加载到 Mysql 数据库中 - loading a word2vec model into a Mysql database Scala/Java word2vec 阅读器 - Scala / Java word2vec reader 如何在 Java 中实现 Word2Vec？ - How to implement Word2Vec in Java? Deeplearning4j Word2Vec 构建器种子 - Deeplearning4j Word2Vec builder seed H2O AI：不支持 MOJO model 'word2vec' - H2O AI : Unsupported MOJO model 'word2vec'

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM