Akka 流停止处理数据

Question

When I run the below stream it does not receive any subsequent data once the stream runs.当我运行以下流时，一旦流运行，它就不会收到任何后续数据。

    final long HOUR = 3600000;
    final long PAST_HOUR = System.currentTimeMillis()-HOUR;

private final static ActorSystem actorSystem = ActorSystem.create(Behaviors.empty(), "as");

protected static ElasticsearchParams constructElasticsearchParams(
        String indexName, String typeName, ApiVersion apiVersion) {
    if (apiVersion == ApiVersion.V5) {
        return ElasticsearchParams.V5(indexName, typeName);
    } else if (apiVersion == ApiVersion.V7) {
        return ElasticsearchParams.V7(indexName);
    }
    else {
        throw new IllegalArgumentException("API version " + apiVersion + " is not supported");
    }
}

    String queryStr = "{ \"bool\": {  \"must\" : [{\"range\" : {"+
            "\"timestamp\" : { "+
            "\"gte\" : "+PAST_HOUR
            +" }} }]}} ";

    ElasticsearchConnectionSettings connectionSettings =
            ElasticsearchConnectionSettings.create("****")
                    .withCredentials("****", "****");

    ElasticsearchSourceSettings sourceSettings =
            ElasticsearchSourceSettings.create(connectionSettings)
                    .withApiVersion(ApiVersion.V7);

    Source<ReadResult<Stats>, NotUsed> dataSource =
            ElasticsearchSource.typed(
                    constructElasticsearchParams("data", "_doc", ApiVersion.V7),
                    queryStr,
                    sourceSettings,
                    Stats.class);

    dataSource.buffer(10000, OverflowStrategy.backpressure());
    dataSource.backpressureTimeout(Duration.ofSeconds(1));

    dataSource
            .log("error")
            .runWith(Sink.foreach(a -> System.out.println(a)), actorSystem);

produces output :产生输出：

ReadResult(id=1656107389556,source=Stats(size=0.09471),version=)

Data is continually being written to the index data but the stream does not process it once it has started.数据不断被写入索引data ，但流一旦开始就不会处理它。 Shouldn't the stream continually process data from the upstream source?流不应该不断处理来自上游源的数据吗？ In this case, the upstream source is an Elastic index named data.在这种情况下，上游源是一个名为 data 的弹性索引。

I've tried amending the query to match all documents :我尝试修改查询以匹配所有文档：

String queryStr =  "{\"match_all\": {}}";

but the same result.但同样的结果。

Answer 1

The Elasticsearch source does not run continuously. Elasticsearch 源不会连续运行。 It initiates a search, manages pagination (using the bulk API) and streams results;它启动搜索、管理分页（使用批量 API）和流式传输结果； when Elasticsearch reports no more results it completes.当 Elasticsearch 不再报告结果时，它会完成。

You could do something like你可以做类似的事情

Source.repeat(Done).flatMapConcat(done -> ElasticsearchSource.typed(...))

Which will run a new search immediately after the previous one finishes.这将在前一个搜索完成后立即运行新搜索。 Note that it would be the responsibility of the downstream to filter out duplicates.请注意，过滤掉重复项是下游的责任。

Akka 流停止处理数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-06-25 18:08:16

Akka 流停止处理数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-06-25 18:08:16

解决方案1
1 已采纳 2022-06-25 18:08:16