简体   繁体   中英

Are Clojure transducers the same concept as intermediate operations on streams in Java?

As I was learning about transducers in Clojure it suddenly struck me what they reminded me of: Java 8 streams!

Transducers are composable algorithmic transformations. They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element.

A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.

Clojure:

(def xf
  (comp
    (filter odd?)
    (map inc)
    (take 5)))

(println
  (transduce xf + (range 100)))  ; => 30
(println
  (into [] xf (range 100)))      ; => [2 4 6 8 10]

Java:

// Purposely using Function and boxed primitive streams (instead of
// UnaryOperator<LongStream>) in order to keep it general.
Function<Stream<Long>, Stream<Long>> xf =
        s -> s.filter(n -> n % 2L == 1L)
                .map(n -> n + 1L)
                .limit(5L);

System.out.println(
        xf.apply(LongStream.range(0L, 100L).boxed())
                .reduce(0L, Math::addExact));    // => 30
System.out.println(
        xf.apply(LongStream.range(0L, 100L).boxed())
                .collect(Collectors.toList()));  // => [2, 4, 6, 8, 10]

Apart from the difference in static/dynamic typing, these seem quite similar to me in purpose and usage.

Is the analogy with transformations of Java streams a reasonable way of thinking about transducers? If not, how is it flawed, or how do the two differ in concept (not to speak of implementation)?

The main difference is that the set of verbs (operations) is somehow closed for streams while it's open for transducers: try for example to implement partition on streams, it feels a bit second class:

import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.Stream.Builder;

public class StreamUtils {
    static <T> Stream<T> delay(final Supplier<Stream<T>> thunk) {
        return Stream.of((Object) null).flatMap(x -> thunk.get());
    }

    static class Partitioner<T> implements Function<T, Stream<Stream<T>>> {
        final Function<T, ?> f;

        Object prev;
        Builder<T> sb;

        public Partitioner(Function<T, ?> f) {
            this.f = f;
        }

        public Stream<Stream<T>> apply(T t) {
            Object tag = f.apply(t);
            if (sb != null && prev.equals(tag)) {
                sb.accept(t);
                return Stream.empty();
            }
            Stream<Stream<T>> partition = sb == null ? Stream.empty() : Stream.of(sb.build());
            sb = Stream.builder();
            sb.accept(t);
            prev = tag;
            return partition;
        }

        Stream<Stream<T>> flush() {
            return sb == null ? Stream.empty() : Stream.of(sb.build());
        }
    }

    static <T> Stream<Stream<T>> partitionBy(Stream<T> in, Function<T, ?> f) {
        Partitioner<T> partitioner = new Partitioner<>(f);
        return Stream.concat(in.flatMap(partitioner), delay(() -> partitioner.flush()));
    }
}

Also like sequences and reducers, when you transform you don't create a "bigger" computation, you create a "bigger" source.

To be able to pass computations, you've introduced xf a function from Stream to Stream to lift operations from methods to first class entities (so as to untie them from the source). By doing so you've created a transducer albeit with a too large interface.

Below is a more general version of the above code to apply any (clojure) transducer to a Stream:

import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.Stream.Builder;

import clojure.lang.AFn;
import clojure.lang.IFn;
import clojure.lang.Reduced;

public class StreamUtils {
    static <T> Stream<T> delay(final Supplier<Stream<T>> thunk) {
        return Stream.of((Object) null).flatMap(x -> thunk.get());
    }

    static class Transducer implements Function {
        IFn rf;

        public Transducer(IFn xf) {
            rf = (IFn) xf.invoke(new AFn() {
                public Object invoke(Object acc) {
                    return acc;
                }

                public Object invoke(Object acc, Object item) {
                    ((Builder<Object>) acc).accept(item);
                    return acc;
                }
            });
        }

        public Stream<?> apply(Object t) {
            if (rf == null) return Stream.empty();
            Object ret = rf.invoke(Stream.builder(), t);
            if (ret instanceof Reduced) {
                Reduced red = (Reduced) ret;
                Builder<?> sb = (Builder<?>) red.deref();
                return Stream.concat(sb.build(), flush());
            }
            return ((Builder<?>) ret).build();
        }

        Stream<?> flush() {
            if (rf == null) return Stream.empty();
            Builder<?> sb = (Builder<?>) rf.invoke(Stream.builder());
            rf = null;
            return sb.build();
        }
    }

    static <T> Stream<?> withTransducer(Stream<T> in, IFn xf) {
        Transducer transducer = new Transducer(xf);
        return Stream.concat(in.flatMap(transducer), delay(() -> transducer.flush()));
    }
}

Another important difference that I see is that Clojure Transducers are composable . I often have the situation that my stream pipelines are a bit longer than in your example, where there are just some intermediate steps that I could re-use elsewhere, eg:

someStream
   .map(...)
   .filter(...)
   .map(...)      // <- gee, there are at least two other
   .filter(...)   // <- pipelines where I could use the functionality
   .map(...)      // <- of just these three steps!
   .filter(...)
   .collect(...)

I haven't found a sane way to achieve that. What I wish I had was something like this:

Transducer<Integer,String> smallTransducer = s -> s.map(...); // usable in a stream Integer -> String
Transducer<String,MyClass> otherTransducer = s -> s.filter(...).map(...); // stream String -> MyClass
Transducer<Integer,MyClass> combinedTransducer = smallTransducer.then(otherTransducer); // compose transducers, to get an Integer -> MyClass transducer

and then use it like this:

someStream
   .map(...)
   .filter(...)
   .transduce(smallTransducer)
   .transduce(otherTransducer)
   .filter(...)
   .collect(...)

// or

someStream
   .map(...)
   .filter(...)
   .transduce(combinedTransducer)
   .filter(...)
   .collect(...)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM