简体   繁体   中英

How can we use artificial neural networks to find similar documents?

How can we use ANN to find some similar documents? I know its a silly question, but I am new to this NLP field. I have made a model using kNN and bag-of-words approach to solve my problem. Using that I can get n number of documents (along with their closeness) that are somewhat similar to the input, but now I want to implement the same using ANN and I am not getting any idea.

Thanks in advance for any help or suggestions.

You can use "word embeddings" - technique, that presents words in the dense vector representation. To find similar documents as the vectors, you can simply use cosine similarity .

An example how to build word2vec model using TensorFlow. One more example how to use embeddings layer from Keras.

The way to obtain embeddings for your language is either training them yourself on your corpus of choice (large enough - eg wikipedia) or downloading the trained embeddings (for python there are plenty of sources for embeddings trained or loadable with gensim module - which is a de facto standard for Python word2vec).

You can also use GloVe (using glove-python ) or FastText word embeddings.

If you are interested you can find more detailed descriptions of embeddings with code examples and source papers .

Have a look at the paper https://arxiv.org/pdf/1805.10685.pdf that gives you a overall idea. check this link for more references https://github.com/Hironsan/awesome-embedding-models

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM