简体   繁体   English

我们如何使用人工神经网络查找相似的文档?

[英]How can we use artificial neural networks to find similar documents?

How can we use ANN to find some similar documents? 我们如何使用ANN查找一些相似的文档? I know its a silly question, but I am new to this NLP field. 我知道这是一个愚蠢的问题,但是我是NLP领域的新手。 I have made a model using kNN and bag-of-words approach to solve my problem. 我使用kNN和词袋方法制作了一个模型来解决我的问题。 Using that I can get n number of documents (along with their closeness) that are somewhat similar to the input, but now I want to implement the same using ANN and I am not getting any idea. 使用它,我可以获得n个与输入内容有些相似的文档(以及它们的紧密程度),但是现在我想使用ANN来实现相同的功能,但我一无所知。

Thanks in advance for any help or suggestions. 在此先感谢您的帮助或建议。

You can use "word embeddings" - technique, that presents words in the dense vector representation. 您可以使用“单词嵌入”技术,以密集的矢量表示形式呈现单词。 To find similar documents as the vectors, you can simply use cosine similarity . 要查找与矢量相似的文档,您可以简单地使用余弦相似度

An example how to build word2vec model using TensorFlow. 如何使用TensorFlow构建word2vec模型的示例。 One more example how to use embeddings layer from Keras. 另一个示例是如何使用Keras的嵌入层

The way to obtain embeddings for your language is either training them yourself on your corpus of choice (large enough - eg wikipedia) or downloading the trained embeddings (for python there are plenty of sources for embeddings trained or loadable with gensim module - which is a de facto standard for Python word2vec). 获取语​​言嵌入的方法是在您选择的语料库上对其进行训练 (足够大,例如Wikipedia),或者下载经过训练的嵌入(对于python,有许多经过培训的或可通过gensim模块加载的嵌入源-这是一个实际上是Python word2vec的标准)。

You can also use GloVe (using glove-python ) or FastText word embeddings. 您也可以使用GloVe (使用Gloves glove-python )或FastText词嵌入。

If you are interested you can find more detailed descriptions of embeddings with code examples and source papers . 如果您有兴趣,可以找到带有代码示例和源论文的嵌入的详细说明

Have a look at the paper https://arxiv.org/pdf/1805.10685.pdf that gives you a overall idea. 看看https://arxiv.org/pdf/1805.10685.pdf可以为您提供总体思路的论文。 check this link for more references https://github.com/Hironsan/awesome-embedding-models 检查此链接以获取更多参考https://github.com/Hironsan/awesome-embedding-models

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 多个人工神经网络 - Multiple artificial neural networks 使用神经网络查找最相似的图像 - Find most similar images by using neural networks 我们如何计算深度神经网络的训练时间? - how we can compute the training time of deep neural networks? 如何在神经网络中使用批量大小 - How can I use batch size in neural networks 如何在 Python 中进行一次热编码并以编程方式获取类数? 对于人工神经网络 - How to do one hot encoded and get the number of classes programatically in Python? For artificial neural networks 给定一组数据,我们如何估计权重?(神经网络支持) - How can we estimate weights given a set of data?(Neural networks back prop) Keras - 人工神经网络 - 使用自定义激活时出错 function - Keras – Artificial Neural Networks - Error when using a custom activation function 我正在研究人工神经网络。 隐藏层在哪里? - I am studying artificial neural networks. Where is the hidden layer? 激活函数在计算人工神经网络成本函数中的作用 - Role of activation function in calculating the cost function for artificial neural networks 我可以在神经网络中使用不同长度的输入数据吗? - Can I use different length of input data in neural networks?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM