简体   繁体   中英

Positional inverted index in MongoDB

I wonder how it would be possible to implement a positional inverted index in MongoDB. By using the multikey feature an inverted index can be created, but how would it be possible to in a efficient way be able to store the positions of the occurences as well?

Let's say we have this object

obj = {
  name: "Apollo",
  text: "Some text about Apollo moon landings",
  keywords: [ "some", "text", "about", "apollo", "moon", "landings" ]
}

I would know like to be able to make a query where "apollo" and "landings" would have to be connected, and not just make an "intersection" query.

What about an object like:

obj = {
  name: "Apollo",
  text: "Some text about Apollo moon landings",
  keywords: [
    {idx:0, text: "some"},
    {idx:1, text: "text"}, 
    {idx:2, text: "about"}, 
    {idx:3, text: "apollo"}, 
    {idx:4, text: "moon"}, 
    {idx:5, text: "landings"}
  ]
}

You can do an ensureIndex on "keywords.text" to do a query where both keywords exists and then use javascript in a "where" filter to check relative positions of the input keywords.

You can use either the $and or the $all operator to do what I believe that you are looking to accomplish.

Given your example document:

> db.test.find().pretty()
{
    "_id" : ObjectId("4f26b716c27b085280a45a29"),
    "name" : "Apollo",
    "text" : "Some text about Apollo moon landings",
    "keywords" : [
        "some",
        "text",
        "about",
        "apollo",
        "moon",
        "landings"
    ]
}

You can use the $and operator to search for a document whose "keywords" array contains both words.

> db.test.find({$and:[{keywords:"apollo"}, {keywords:"landings"}]})
{ "_id" : ObjectId("4f26b716c27b085280a45a29"), "name" : "Apollo", "text" : "Some text about Apollo moon landings", "keywords" : [ "some", "text", "about", "apollo", "moon", "landings" ] }
> 

The $all operator will return the same result, and the query is a little more streamlined:

> db.test.find({keywords:{$all:["apollo", "landings"]}})
{ "_id" : ObjectId("4f26b716c27b085280a45a29"), "name" : "Apollo", "text" : "Some text about Apollo moon landings", "keywords" : [ "some", "text", "about", "apollo", "moon", "landings" ] }

If we put an index on the keywords array, both queries make use of it.

> db.test.ensureIndex({keywords:1})
> db.test.find({$and:[{keywords:"apollo"}, {keywords:"landings"}]}).explain()
{
    "cursor" : "BtreeCursor keywords_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "keywords" : [
            [
                "apollo",
                "apollo"
            ]
        ]
    }
}
> db.test.find({keywords:{$all:["apollo", "landings"]}}).explain()
{
    "cursor" : "BtreeCursor keywords_1",
    "nscanned" : 1,
    "nscannedObjects" : 1,
    "n" : 1,
    "millis" : 0,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "isMultiKey" : true,
    "indexOnly" : false,
    "indexBounds" : {
        "keywords" : [
            [
                "apollo",
                "apollo"
            ]
        ]
    }
}
> 

Both queries make use of the keywords index.

For more information on the different types of queries, please refer to the "Advanced Queries" document.
http://www.mongodb.org/display/DOCS/Advanced+Queries

For more information on how indexing works in Mongo, please refer to the "Indexing" document.
http://www.mongodb.org/display/DOCS/Indexes#Indexes-IndexingArrayElements

The "Indexing Array Elements" section links to the documentation on MultiKeys. http://www.mongodb.org/display/DOCS/Multikeys

If you are unfamiliar with the .explain function of mongodb, it is explained here: http://www.mongodb.org/display/DOCS/Explain In a nutshell, it displays any indexes that your query is using, and how many documents needed to be accessed in order to return the relevant ones.

Finally, your question seems similar to that of another user who was asking about searching for values in arrays earlier this morning. Perhaps this will be relevant to you as well.
http://groups.google.com/group/mongodb-user/browse_thread/thread/38f30a56094d9e3e

Hopefully, this will help you to write the query that you are looking for. Please let us know if you have any follow-up questions!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM