简体   繁体   中英

How to query batch by batch from ElasticSearch in nodejs

I'm trying to get data from ElasticSearch with my node application. In my index, there are 1 million records, thus I cannot be sent to another services with the whole records. That's why I want to get 10,000 records per request, as per example:

const getCodesFromElasticSearch = async (batch) => {
  let startingCount = 0;
  if (batch > 1) {
    startingCount = (batch * 1000);
  } else if (batch === 1) {
    startingCount = 0;
  }
  return await esClient.search({
    index: `myIndex`,
    type: 'codes',
    _source: ['column1', 'column2', 'column3'],
    body: {
      from: startingCount,
      size: 1000,
      query: {
        bool: {
          must: [
              ....
          ],
          filter: {
              ....
          }
        }
      },
      sort: {
        sequence: {
          order: "asc"
        }
      }
    }
  }).then(data => data.hits.hits.map(esObject => esObject._source));
}

It's still working when batch=1 . But when goes to batch=2 , that got problem that from should not be larger than 10,000 as per its documentation. And I don't want to change max_records as well. Please let me know any alternate way to get 10,000 by 10,000 .

The scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use the cursor on a traditional database.

So you can use scroll API to get your whole 1M dataset below-something like below without using from because elasticsearch normal search has a limit of 10k record in max request so when you try to use from with greater value then it'll return error, that's why scrolling is good solutions for this kind of scenarios.

let allRecords = [];

// first we do a search, and specify a scroll timeout
var { _scroll_id, hits } = await esclient.search({
index: 'myIndex',
type: 'codes',
scroll: '30s',
body: {
    query: {
        "match_all": {}
    },
    _source: ["column1", "column2", "column3"]
  }
})

while(hits && hits.hits.length) {
// Append all new hits
allRecords.push(...hits.hits)

console.log(`${allRecords.length} of ${hits.total}`)

var { _scroll_id, hits } = await esclient.scroll({
    scrollId: _scroll_id,
    scroll: '30s'
 })
}

console.log(`Complete: ${allRecords.length} records retrieved`)

You can also add your query and sort with this existing code snippets.

As per comment:

Step 1. Do normal esclient.search and get the hits and _scroll_id . Here you need to send the hits data to your other service and keep the _scroll_id for a future batch of data calling.

Step 2 Use the _scroll_id from the first batch and use a while loop until you get all your 1M record with esclient.scroll . Here you need to keep in mind that you don't need to wait for all of your 1M data, within the while loop when you get response back just send it to your service batch by batch.

See Scroll API : https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/scroll_examples.html

**See Search After **: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-request-search-after.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM