简体   繁体   中英

mongoDB index strategy

I have a collection called Post. I have a mapping system that always ensures that each document has these fields:

  • id (int)
  • target (string)
  • type (string)
  • user_id
  • client_id
  • updated (string, 11 int timestamp)
  • created (string, 11 int timestamp)
  • enabled (bool)

This collection is accessed to be output in an API schema.

So some typical requests might be:

/post?type=image&user_id=2
/post?updated=35234423&order_by=client_id
/post?enabled=true&order_by=id

There is no 100% guarantee certain fields make it to the find or sort field.

Recently when the table reached 8GB of data, I started getting this error:

"localhost:27017: too much data for sort() with no index. add an index or specify a smaller limit"

I have looked at the documentation for Mongo index and found it difficult to understand whether it works in the same way as a MySQL index.

Some threads I found on indexing: MongoDB - too much data for sort() with no index error seem to suggest using specific sort fields to ensure the index is hit. Obviously I cannot do that when alot of my filtering and sorting is optional.

Could anyone suggest what a firm solution would be in terms of whether I should index all fields on my table?


Thanks for the feedback guys, I've started building an auto index function:

public function get() {

        $indices['Post'] = array(
            'fields' =>
                array(
                    'id'                => array('unique' => true, 'dropDups' => true, 'background' => true),
                    'client_id'         => array('dropDups' => true, 'background' => true),
                    'image_id'          => array('dropDups' => true, 'background' => true),
                    'user_id'           => array('dropDups' => true, 'background' => true),
                    'publish_target'    => array('dropDups' => true, 'background' => true),
                    'type'              => array('dropDups' => true, 'background' => true),
                    'status'            => array('dropDups' => true, 'background' => true),
                    'text'              => array('background' => true)
                )
        );

        foreach ($indices as $key => $index) {

            /* set the collection */
            $collection = $this->mongoDB->{$key};

            /* delete the indexes */
            $collection->deleteIndexes();

            /* loop the fields and add the index */
            foreach ($index['fields'] as $subKey => $data) {
                $collection->ensureIndex($subKey, array_merge($data, array('name' => $subKey)));
            }
        }
        /* return the list */
        return $indices;
    }

You should know upfront what sort of queries are going to hit the server. Without that you can't do any optimisation and might run into sort issues like you have now.

If you say that the users get to sort by any of the 9 fields that you have, you will need to create an index on each of them. However you need to remember that sometimes it makes more sense to create a compound index, as preventing the issue for:

/post?updated=35234423&order_by=client_id

Can only done by setting an index on:

{ updated: 1, client_id: 1 }

Indexes in MongoDB can only be used if all left-hand side fields in the index are part of your query.

So: { updated: 1, client_id: 1 } works optimal for:

  • find( { 'updated' : 1 } );
  • find( { 'updated' : 1, 'client_id' : 1 } );
  • find( { 'updated' : 1 } ).sort( { 'client_id' : 1 } );

But not for:

  • find( { 'client_id' : 1 } );
  • find( { 'client_id' : 1 } ).sort( { 'updated' : 1 } );

In order to reduce the amount of data and to prevent your error message you can also additionally add a limit() to each query. With 8MB of results, I doubt your UI can show that many results anyway, so using limit() might make sense.

Unfortunately I cannot think of a really good solution to such a dynamic nature with indexes however, this JIRA https://jira.mongodb.org/browse/SERVER-3071 would really help you.

I suggest you watch that JIRA ticket.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM