[英]How can we use artificial neural networks to find similar documents?
How can we use ANN to find some similar documents? 我们如何使用ANN查找一些相似的文档? I know its a silly question, but I am new to this NLP field.
我知道这是一个愚蠢的问题,但是我是NLP领域的新手。 I have made a model using kNN and bag-of-words approach to solve my problem.
我使用kNN和词袋方法制作了一个模型来解决我的问题。 Using that I can get n number of documents (along with their closeness) that are somewhat similar to the input, but now I want to implement the same using ANN and I am not getting any idea.
使用它,我可以获得n个与输入内容有些相似的文档(以及它们的紧密程度),但是现在我想使用ANN来实现相同的功能,但我一无所知。
Thanks in advance for any help or suggestions. 在此先感谢您的帮助或建议。
You can use "word embeddings" - technique, that presents words in the dense vector representation. 您可以使用“单词嵌入”技术,以密集的矢量表示形式呈现单词。 To find similar documents as the vectors, you can simply use cosine similarity .
要查找与矢量相似的文档,您可以简单地使用余弦相似度 。
An example how to build word2vec model using TensorFlow. 如何使用TensorFlow构建word2vec模型的示例。 One more example how to use embeddings layer from Keras.
另一个示例是如何使用Keras的嵌入层 。
The way to obtain embeddings for your language is either training them yourself on your corpus of choice (large enough - eg wikipedia) or downloading the trained embeddings (for python there are plenty of sources for embeddings trained or loadable with gensim
module - which is a de facto standard for Python word2vec). 获取语言嵌入的方法是在您选择的语料库上对其进行训练 (足够大,例如Wikipedia),或者下载经过训练的嵌入(对于python,有许多经过培训的或可通过
gensim
模块加载的嵌入源-这是一个实际上是Python word2vec的标准)。
You can also use GloVe (using glove-python
) or FastText word embeddings. 您也可以使用GloVe (使用Gloves
glove-python
)或FastText词嵌入。
If you are interested you can find more detailed descriptions of embeddings with code examples and source papers . 如果您有兴趣,可以找到带有代码示例和源论文的嵌入的详细说明 。
Have a look at the paper https://arxiv.org/pdf/1805.10685.pdf that gives you a overall idea. 看看https://arxiv.org/pdf/1805.10685.pdf可以为您提供总体思路的论文。 check this link for more references https://github.com/Hironsan/awesome-embedding-models
检查此链接以获取更多参考https://github.com/Hironsan/awesome-embedding-models
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.