Spring 数据 elasticsearch 带有 Pageable 的存储库仅重新调整 10000 个文档

Question

我在 elasticsearch 中有 17364 个文档的索引。

$curl http://localhost:9200/performance/_count
{"count":17364,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0}}

Spring数据仓库，

public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
}

逐页获取所有文档并打印：

public void testReport() {

  int page = 0, pageSize = 1000;
  Pageable of = PageRequest.of(page, pageSize);

  Page<Transaction> all = testRepository.findAll(of);
  int numberOfPages = all.getTotalPages();

  log.info("All pages: {},  {}", numberOfPages, all.getTotalElements());
  do {
     log.info("Current page: {}, {}", of.getPageNumber(), of.getPageSize());
     for (Transaction txn : all) {
        log.info(mapper.writeValueAsString(txn));
     }
  } while ((of = of.next()) != null && (transactionRepository.findAll(of)) != null);

}

尽管索引有 17364 个文档，但此代码仅返回 10000 个文档。 你能帮我找出为什么会这样吗？

ElasticSearch 版本：7.9
spring-boot-starter-parent：2.3.2.RELEASE

Answer 1

我看到两个选项：

A. 由于您只有 17364 个文档，您可以将索引中的index.max_result_window设置增加到（例如）20000，这样您就可以分页到最后：

PUT performance/_settings
{
  "index.max_result_window": 20000
}

B. 如果你有一个更大的索引和/或增加index.max_result_window限制出于任何原因不是一个选项，那么你需要利用Scroll API 。 Spring Data ES 支持两种方法。

第一种方法涉及利用内部使用 Scroll API 的ElasticsearchTemplate.searchForStream()方法

SearchHitsIterator<Transaction> stream = elasticsearchTemplate.searchForStream(searchQuery, Transaction.class, "performance");

第二种方法有点低级。 您需要使用返回Stream的方法修改存储库定义：

public interface TestRepository extends ElasticsearchRepository<Transaction, String> {
    Stream<Transaction> findScrollAll();
}

然后使用ElasticsearchTemplate. searchScrollStart() ElasticsearchTemplate. searchScrollStart()和ElasticsearchTemplate. searchScrollContinue() ElasticsearchTemplate. searchScrollContinue()

添加：

第三个选项：

只定义一个方法

Stream<Searchhit<Transaction>> searchBy()

在你的Testrepository中。 或者只使用返回类型Stream<Transaction> 。

Spring 数据 elasticsearch 带有 Pageable 的存储库仅重新调整 10000 个文档

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-12 12:01:53

Spring 数据 elasticsearch 带有 Pageable 的存储库仅重新调整 10000 个文档

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-12 12:01:53

解决方案1
1 已采纳 2020-11-12 12:01:53