简体   繁体   中英

Index a large collection of MongoDB with Elasticsearch

I have a large collection (~25M) in MongoDB and I want to index all documents in it with ElasticSearch.

In my NodeJS with mongoose code, I am doing the following:

thebody = [];

Model
    .find({})
    .stream()
    .on('data', function(doc){
        thebody.push({index: {_index: index, _type: type, _id: doc._id}});
        thebody.push(doc);
    })
    .on('close', function () {
        client.bulk({
            body: thebody
        });
    })

I use the bulk function because I think that it is better than indexing each document individually. However, this leads to a memory problem (because of the large array thebody ).

Is it better to index each element individually? Anyone know a better solution? (I can't use rivers because my ES version is 2.2)

Bulk API is definitely faster and more efficient way if you are indexing huge amount of data.

However, the amount of data you will be able to successfully process will also depend upon the client's configuration. You definitely will not want to hold on to a large chunk of client resources.

Why not call .bulk function in batches of say 10k documents.

Mongoosastic : https://github.com/mongoosastic/mongoosastic/blob/master/README.md

Mongoosastic is a mongoose plugin that can automatically index your models into elasticsearch. The latest version of this package will be as close as possible to the latest elasticsearch and mongoose packages.

npm install -S mongoosastic

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM