简体   繁体   English

如何从nodejs中的ElasticSearch批量查询

[英]How to query batch by batch from ElasticSearch in nodejs

I'm trying to get data from ElasticSearch with my node application.我正在尝试使用我的节点应用程序从 ElasticSearch 获取数据。 In my index, there are 1 million records, thus I cannot be sent to another services with the whole records.在我的索引中,有 100 万条记录,因此我无法将全部记录发送到另一个服务。 That's why I want to get 10,000 records per request, as per example:这就是为什么我希望每个请求获得 10,000 条记录,例如:

const getCodesFromElasticSearch = async (batch) => {
  let startingCount = 0;
  if (batch > 1) {
    startingCount = (batch * 1000);
  } else if (batch === 1) {
    startingCount = 0;
  }
  return await esClient.search({
    index: `myIndex`,
    type: 'codes',
    _source: ['column1', 'column2', 'column3'],
    body: {
      from: startingCount,
      size: 1000,
      query: {
        bool: {
          must: [
              ....
          ],
          filter: {
              ....
          }
        }
      },
      sort: {
        sequence: {
          order: "asc"
        }
      }
    }
  }).then(data => data.hits.hits.map(esObject => esObject._source));
}

It's still working when batch=1 .batch=1时它仍然有效。 But when goes to batch=2 , that got problem that from should not be larger than 10,000 as per its documentation.但是,当转到batch=2时,根据其文档,出现的问题from不应大于10,000 And I don't want to change max_records as well.而且我也不想更改max_records Please let me know any alternate way to get 10,000 by 10,000 .请让我知道通过10,000获得10,000的任何替代方法。

The scroll API can be used to retrieve large numbers of results (or even all results) from a single search request, in much the same way as you would use the cursor on a traditional database.滚动 API可用于从单个搜索请求中检索大量结果(甚至所有结果),其方式与在传统数据库上使用 cursor 的方式大致相同。

So you can use scroll API to get your whole 1M dataset below-something like below without using from because elasticsearch normal search has a limit of 10k record in max request so when you try to use from with greater value then it'll return error, that's why scrolling is good solutions for this kind of scenarios.因此,您可以使用scroll API 来获取您的整个 1M 数据集,如下所示而不使用from因为 elasticsearch 正常搜索在最大请求中限制为 10k 记录,因此当您尝试使用更大的值时,它将返回错误,这就是为什么滚动是这种场景的好解决方案。

let allRecords = [];

// first we do a search, and specify a scroll timeout
var { _scroll_id, hits } = await esclient.search({
index: 'myIndex',
type: 'codes',
scroll: '30s',
body: {
    query: {
        "match_all": {}
    },
    _source: ["column1", "column2", "column3"]
  }
})

while(hits && hits.hits.length) {
// Append all new hits
allRecords.push(...hits.hits)

console.log(`${allRecords.length} of ${hits.total}`)

var { _scroll_id, hits } = await esclient.scroll({
    scrollId: _scroll_id,
    scroll: '30s'
 })
}

console.log(`Complete: ${allRecords.length} records retrieved`)

You can also add your query and sort with this existing code snippets.您还可以添加您的query并使用此现有代码片段sort

As per comment:根据评论:

Step 1. Do normal esclient.search and get the hits and _scroll_id .步骤 1.执行正常的esclient.search并获取hits_scroll_id Here you need to send the hits data to your other service and keep the _scroll_id for a future batch of data calling.在这里,您需要将hits数据发送到您的其他服务,并保留_scroll_id以供将来的一批数据调用。

Step 2 Use the _scroll_id from the first batch and use a while loop until you get all your 1M record with esclient.scroll .第 2 步使用第一批中的_scroll_id并使用 while 循环,直到使用esclient.scroll获得所有 1M 记录。 Here you need to keep in mind that you don't need to wait for all of your 1M data, within the while loop when you get response back just send it to your service batch by batch.在这里您需要记住,您不需要等待所有 1M 数据,在 while 循环中,当您收到响应时,只需将其批量发送到您的服务。

See Scroll API : https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/scroll_examples.html请参阅滚动 APIhttps://www.elastic.co/guide/en/elasticsearch/client/javascript-api/current/scroll_examples.ZFC35FDC70D5FC69D269883A822C7A53E

**See Search After **: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-request-search-after.html **参见搜索后**: https://www.elastic.co/guide/en/elasticsearch/reference/5.2/search-request-search-after.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM