简体   繁体   中英

indexing big collection for faster retrieval of data

I have a collection of items with over 10 million records. The records are indexes like this (feel free to criticize if incorrect):

        cars.create_index([('car_id',1),('timestamp',-1)], unique=True) 
        cars.create_index([('car_id',1),('timestamp',1)])
        cars.create_index([('timestamp',-1),('car_id',1)])
        cars.create_index([('timestamp',1),('car_id',1)])
        cars.create_index([('timestamp',1)])
        cars.create_index([('timestamp',-1)])
        cars.create_index([('car_id',1)]) 

My objective is to have fast retrieval of cars and their last timestamp (from this document I get some important values). A unique entry is given by the car_id and its timestamp.

Problem : Consider the following query: Get all unique cars (there are ~2000) with their latest timestamp. Intuitively, since the cars and their timestamp is indexed I would expect a very fast retrieval however - it takes over 65 seconds (on a 4 core cpu + 32gb ram) to get the data.

Question : How can I improve the indexing so the retrieval will be much faster?

To be precise, this is my query that I'm running but it is still close to the initial request (the query is asking for car_id with the last 2 records).

In deed having all these indexes is mainly a waste of disk space. In your query you select all documents from the collection, so an index will not improve anything. Fetching 10 Million records simply takes some time. An index only helps you to find specific documents. As a rule of thump you can say: If your query selects more than 10% of all data, then a full collection (or in relational databases "full table") scan is usually faster.

However, based on Optimization to Return the First Document of Each Group try this one:

db.collection.createIndex({ car_id: 1, timestamp: -1 })

db.collection.aggregate([
  {$sort: { car_id: 1, timestamp: -1 } }
  {
    $group: {
      _id: "$car_id",
      k_value: { $first: "$k_value" },
      timestamp: { $first: "$timestamp" }
    }
  }
])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM