I'm trying to get data from ElasticSearch with my node application. In my index, there are 1 million records, thus I cannot be sent to another services with the whole records. That's why I want to get 10,000 records per request, as per example:
const getCodesFromElasticSearch = async (batch) => {
let startingCount = 0;
if (batch > 1) {
startingCount = (batch * 1000);
} else if (batch === 1) {
startingCount = 0;
}
return await esClient.search({
index: `myIndex`,
type: 'codes',
_source: ['column1', 'column2', 'column3'],
body: {
from: startingCount,
size: 1000,
query: {
bool: {
must: [
....
],
filter: {
....
}
}
},
sort: {
sequence: {
order: "asc"
}
}
}
}).then(data => data.hits.hits.map(esObject => esObject._source));
}
It's still working when batch=1
. But when goes to batch=2
, that got problem that from
should not be larger than 10,000
as per its documentation. And I don't want to change max_records
as well. Please let me know any alternate way to get 10,000
by 10,000
.
The scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use the cursor on a traditional database.
So you can use scroll
API to get your whole 1M dataset below-something like below without using from because elasticsearch normal search has a limit of 10k record in max request so when you try to use from with greater value then it'll return error, that's why scrolling is good solutions for this kind of scenarios.
let allRecords = [];
// first we do a search, and specify a scroll timeout
var { _scroll_id, hits } = await esclient.search({
index: 'myIndex',
type: 'codes',
scroll: '30s',
body: {
query: {
"match_all": {}
},
_source: ["column1", "column2", "column3"]
}
})
while(hits && hits.hits.length) {
// Append all new hits
allRecords.push(...hits.hits)
console.log(`${allRecords.length} of ${hits.total}`)
var { _scroll_id, hits } = await esclient.scroll({
scrollId: _scroll_id,
scroll: '30s'
})
}
console.log(`Complete: ${allRecords.length} records retrieved`)
You can also add your query
and sort
with this existing code snippets.
As per comment:
Step 1. Do normal esclient.search
and get the hits
and _scroll_id
. Here you need to send the hits
data to your other service and keep the _scroll_id
for a future batch of data calling.
Step 2 Use the _scroll_id
from the first batch and use a while loop until you get all your 1M record with esclient.scroll
. Here you need to keep in mind that you don't need to wait for all of your 1M data, within the while loop when you get response back just send it to your service batch by batch.
See Scroll API : https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/scroll_examples.html
**See Search After **: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-request-search-after.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.