简体   繁体   English

如何将Trident / Storm中的值存储在列表中(使用Java API)

[英]How to store values from Trident/Storm in a List (using the Java API)

I'm trying to create a few Unit Tests to verify that certain parts of my Trident topology are doing what they are supposed to. 我正在尝试创建一些单元测试,以验证Trident拓扑的某些部分是否按预期进行。

I'd like to be able to retrieve all the values resulting after running the topology and put them in a List so I can "see" them and check conditions on them. 我希望能够检索运行拓扑后得到的所有值并将它们放在列表中,以便我可以“查看”它们并检查它们的条件。

   FeederBatchSpout feederSpout = new FeederBatchSpout("some_time_field", "foo_id");
   TridentTopology topology = new TridentTopology();
   topology.newStream("spout1", feederSpout)
    .groupBy(new Fields("some_time_field", "foo_id"))
    .aggregate(new Fields("foo_id"), new FooAggregator(),
               new Fields("aggregated_foos"))
    // Soo... how do I retrieve the "aggregated_foos" from here?

I am running the topology as a TrackedTopology (got the code from another SO question , thank you @brianghig for asking it and @Thomas Kielbus for the reply) 我正在将拓扑作为TrackedTopology运行(从另一个SO问题中获得了代码,谢谢@brianghig提出的问题以及@Thomas Kielbus的答复)

This is how I "launch" the topology and how I feed sample values into it: 这是我“启动”拓扑以及将样本值输入拓扑的方式:

TrackedTopology tracked = Testing.mkTrackedTopology(cluster, topology.build());
cluster.submitTopology("unit_tests", config, tracked.getTopology());

feederSpout.feed(new Values(MyUtils.makeSampleFoo(1));
feederSpout.feed(new Values(MyUtils.makeSampleFoo(2));

When I do this, I can see in the log messages that the topology is running correctly, and that the values are calculated properly, but I'd like to "fish" the results out into a List (or any structure, at this point) so I can actually put some Asserts in my tests. 当我这样做时,我可以在日志消息中看到拓扑正在正确运行,并且已经正确计算了值,但是我想将结果“钓鱼”到List (或任何结构)中),因此我实际上可以在测试中加入一些Asserts

I've been trying [as**ton] of different approaches, but none of them work. 我一直在尝试各种方法,但是没有一个起作用。

The latest idea was adding a bolt after the aggregation so it would "persist" my values into a list: 最新的想法是在聚合之后添加一个螺栓,以便将我的值“持久”在列表中:

Below you'll see the class that tries to go through all the tuples emitted by the aggregate and would put them in a list that I had previously initialized: 在下面,您将看到该类尝试遍历aggregate发出的所有元组,并将它们放入我先前初始化的列表中:

class FieldFetcherStateUpdater extends BaseStateUpdater<FieldFetcherState> {
    final List<AggregatedFoo> results;

    public FieldFetcherStateUpdater(List<AggregatedFoo> results) {
        this.results = results;
    }

    @Override
    public void updateState(FieldFetcherState state, List<TridentTuple> tuples,
                            TridentCollector collector) {
        for (TridentTuple tuple : tuples) {
            results.add((AggregatedFoo) tuple.getValue(0));
        }
    }
}

So now the code would look like: 所以现在代码看起来像:

// ...
List<AggregatedFoo> results = new ArrayList();
topology.newStream("spout1", feederSpout)
    .groupBy(new Fields("some_time_field", "foo_id"))
    .aggregate(new Fields("foo_id"), new FooAggregator(),
               new Fields("aggregated_foos"))
    .partitionPersist(new FieldFetcherFactory(),
                        new Fields("aggregated_foos"),
                        new FieldFetcherStateUpdater(results));

     LOGGER.info("Done. Checkpoint results={}", results);

But nothing... The logs show Done. Checkpoint results=[] 但是什么都没有...日志显示Done. Checkpoint results=[] Done. Checkpoint results=[] (empty list) Done. Checkpoint results=[] (空列表)

Is there a way to get that? 有办法吗? I imagine it must be doable, but I haven't been able to figure out a way... 我想它一定是可行的,但是我还没想办法。

Any hint or link to pages or anything of the like will be appreciated. 页面的任何提示或链接或任何类似内容将不胜感激。 Thank you in advance. 先感谢您。

You need to use a static member variable result . 您需要使用静态成员变量 result If you have multiple parallel tasks running (ie, parallelism_hint > 1 ) you also need to synchronize the write access to result . 如果有多个并行任务正在运行(即, parallelism_hint > 1 ),则还需要synchronizeresult的写访问。

In your case, result will be empty, because Storm internally, creates a new instance of your bolt (including a new instance of ArrayList ). 在您的情况下, result将为空,因为Storm在内部创建了螺栓的新实例(包括ArrayList的新实例)。 Using a static variable ensures, that you get access to the correct object (as there will be only one over all instances of your bolt). 使用静态变量可确保您可以访问正确的对象(因为螺栓的所有实例中只有一个)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM