简体   繁体   English

如何将计时器指标添加到java.util.Stream

[英]How to add Timer metrics to java.util.Stream

The general concern is how to add timing metrics to various parts of a java.util.Stream execution. 人们普遍关心的是如何向java.util.Stream执行的各个部分添加计时指标。 At the termination it is easy to time the total operation, eg, (using the codahale library) 在终止时,很容易对整个操作进行计时,例如(使用codahale库)

try (Context ctx = timer.time()){
   stream.count();
}

But what about "per-item" timing? 但是“逐项”计时呢? Or how to add timers to the intermediate portions of a stream, eg, timing how long the first 5 stages of a 10 stage stream takes? 或如何向流的中间部分添加计时器,例如,对10级流的前5个级花费多长时间进行计时?

It is easy to time individual steps in intermediate stages merely by adding timers to those methods. 仅通过向这些方法添加计时器,就可以很容易地对中间阶段中的各个步骤进行计时。 And the initial Spliterator code could measure the time between the first occurrence of tryAdvance and the close() method (it would have to add an onClose Runnable to the stream it generates). 最初的Spliterator代码可以测量首次出现tryAdvance和close()方法之间的时间(它必须向其生成的流中添加onClose Runnable)。 That at least allows stream-supplying libraries to use Timers even though they don't know how their streams are being transformed and consumed. 这至少允许流提供库使用Timer,即使它们不知道如何对流进行转换和使用也是如此。

It would be wonderful to write something like: 写这样的东西将是很棒的:

List result = stream
   // stream ops ...etc...etc
   .timeTotalOperation(totalOpTimer) //time between first traverse and close()
   .timePerItemOperation(perItemTimer) //"forEach" timer at this stage
   .collect(Collectors.toList());

but we can't add these methods to the Stream interface, obviously. 但显然,我们无法将这些方法添加到Stream接口。

It doesn't seem to make any sense to wrap the Stream with a delegating pattern. 用委托模式包装Stream似乎没有任何意义。 As far as I can tell, the "right" implementation is to tap into the Pipeline classes and they are inaccessible and (possibly) subject to change. 据我所知,“正确”的实现是利用Pipeline类,它们是不可访问的,并且(可能)会发生变化。

I can't even extend the Collectors to time the terminal stage since the classes are final or package visibility. 由于类是最终的或程序包可见性,我什至无法延长收集器的时间到结束阶段。 While I can roll my own Collector and call the stream(Collector) myself, there goes all the useful functionality in Collectors. 虽然我可以滚动自己的收集器并自己调用stream(Collector),但收集器中有所有有用的功能。 However, it should be possible to write a CollectorDelegate class that wraps an item returned from the Collections, eg, 但是,应该可以编写一个CollectorDelegate类,该类包装从Collections返回的项目,例如,

List result = stream
   .collect(new TimingCollector(Collectors.toList(), totalOpTimer, perItemTimer));

It must be admitted that the concept of "per-item" is "iffy", given the complexities of Stream use cases. 考虑到流使用案例的复杂性,必须承认“每个项目”的概念是“不确定”。 There are probably operations where "per-item" timing doesn't even make sense. 在某些操作中,“逐项”计时甚至没有意义。 But even for the simplest use cases for Streams, I can't figure out a clean way to do this. 但是,即使对于Streams最简单的用例,我也无法找出一种干净的方法来实现。

Such an open-ended issue poses too many questions for a good thread, so let me attempt to pose just one. 这样的开放性问题对于一个好的线程提出了太多的问题,所以让我尝试提出一个问题。 Read a stream from a database, convert to java objects, measure just the reading from the database and the conversion to java, then forward the stream to a consumer to more work, but do not time that portion: 从数据库中读取流,转换为java对象,仅测量从数据库中读取的数据并转换为java,然后将流转发给使用者以进行更多工作,但不要计时该部分:

import java.util.function.Consumer;
import java.util.stream.Stream;


interface SQLResultSetSupplier {

    default Stream<Object[]> generateStream() {
        return Stream.generate(this::getExpensiveResultSet);
    }
    Object[] getExpensiveResultSet();

    Object expensivelyConvertToJava(Object[] row);
}


public class StreamTimerExample {

    public void example(SQLResultSetSupplier supplier, Consumer<Object> reportConsumer) {
        /**
         * Supplier performs a database query and returns a Stream on the ResultSet.
         * Convert each row of the ResultSet to a Java object.
         * Measure JUST THE ABOVE on a per-item basis.
         *
         * Then send the stream on to a Consumer, e.g., to generate a report.
         * Do NOT measure this second portion.
         */
        Stream<Object[]> baseStream = supplier.generateStream();
        Stream<Object> expensiveOperationStream = baseStream.map(t -> supplier.expensivelyConvertToJava(t)); // measure this
        expensiveOperationStream.forEach(reportConsumer); //don't measure this
    }

}

My gut feeling here: you are investing your time in the wrong place. 我的直觉是:您将时间浪费在了错误的地方。

You intend to spend a lot of time and energy to implement your own code instrumentation in the end. 最后,您打算花费大量时间和精力来实现自己的代码工具。

Meaning: why the focus on "streams"? 含义:为什么关注“流”? In the end, what matters is the overall performance of your "end user" functionality. 最后,重要的是“最终用户”功能的整体性能。 Sure, the streams might make up a significant part of that. 当然,流可能占其中的很大一部分。 But you are still investing a lot of energy to create visibility ... for a very specific "corner" of your system. 但是您仍然在投入大量精力来创建可见性……针对系统的特定“角落”。

I would suggest a different strategy: rather use a profiler and measure end-to-end use cases. 我建议采取另一种策略:宁可使用探查器并测量端到端用例。 And then, you could still (pretty easily) configure the profiler to restrict measurements to stream operations. 然后,您仍然可以(非常容易地)配置探查器以将测量限制为流操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM