简体   繁体   English

如何通过TF / IDF分数获取文档中的单词列表

[英]How to get a list of words in a doc by TF/IDF scores

I have an ElasticSearch index. 我有一个ElasticSearch索引。 Given a document ID in the index, I want to get a list of words in the doc by TF-IDF scores. 给定索引中的文档ID,我想通过TF-IDF分数获取文档中的单词列表。 Is that possible to write an ES query to get the list? 是否可以编写ES查询来获取列表?

Thanks in advance. 提前致谢。

You could retrieve the list of all terms in the document and then use explain while searching for all words in the document. 您可以检索文档中所有术语的列表,然后在搜索文档中的所有单词时使用explain。

Ex: If the document contains foo and bar, query would be: 例如:如果文档包含foo和bar,则查询为:

/MY_INDEX/MY_TYPE/_search?q=_id:MY_ID foo bar&explain=true&size=1

In the score explanation you will see the idf score and the tf score for each word. 在分数说明中,您将看到每个单词的idf分数和tf分数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM