简体   繁体   English

反向索引

[英]Inverted indexing

I'm working on inverted indexing and my question is: in the final step we should return the total number of documents the word appeared in or just each document number ? 我正在研究倒置索引,我的问题是:在最后一步中,我们应该返回单词出现的文档总数还是仅返回每个文档编号? for example : if the word "Hello" appeared in 3 documents(document A and document B and document C) I should return 3 or A,B,C ? 例如:如果3个文件(文件A和文件B和文件C)中出现“Hello”字样,我应该返回3或A,B,C?

An Index implies it will give you a lookup to something, not just a number. 索引意味着它会给你一些东西,而不仅仅是一个数字。 A frequency count would give you a count of the number of occurrences of a word. 频率计数可以计算出单词的出现次数。

BTW You can get the number from the A,B,C but not the other way around. BTW您可以从A,B,C获得数字,但不是相反。

That's totally up to you ! 这完全取决于你!

If you just need to return the total number of documents a certain word appears in, then you won't even need an inverted index. 如果您只需要返回某个单词出现的文档总数,那么您甚至不需要反向索引。 All you would need is a mapping from words to counts. 您所需要的只是从单词到计数的映射。 That would take much less computation and space than an inverted index. 与倒排索引相比,这将花费更少的计算和空间。

If you're working on an exercise in Information Retrieval (or doing some proof of concept, etc), it seems to me that you would also need to return the docs where a given words was found, that's Boolean Retrieval 如果你正在进行信息检索练习(或做一些概念验证等),在我看来你还需要返回找到给定单词的文档,即布尔检索

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM