Clojure 转换器是否与 Java 中的流中间操作具有相同的概念？

Question

As I was learning about transducers in Clojure it suddenly struck me what they reminded me of: Java 8 streams!当我在 Clojure 中学习转换器时，我突然想起了它们让我想起的东西：Java 8 流！

Transducers are composable algorithmic transformations.转换器是可组合的算法转换。 They are independent from the context of their input and output sources and specify only the essence of the transformation in terms of an individual element.它们独立于其输入和输出源的上下文，并且仅根据单个元素指定转换的本质。

A stream is not a data structure that stores elements; 流不是存储元素的数据结构； instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations. 相反，它通过计算操作的管道传送来自数据结构、数组、生成器函数或 I/O 通道等源的元素。

Clojure: Clojure：

(def xf
  (comp
    (filter odd?)
    (map inc)
    (take 5)))

(println
  (transduce xf + (range 100)))  ; => 30
(println
  (into [] xf (range 100)))      ; => [2 4 6 8 10]

Java:爪哇：

// Purposely using Function and boxed primitive streams (instead of
// UnaryOperator<LongStream>) in order to keep it general.
Function<Stream<Long>, Stream<Long>> xf =
        s -> s.filter(n -> n % 2L == 1L)
                .map(n -> n + 1L)
                .limit(5L);

System.out.println(
        xf.apply(LongStream.range(0L, 100L).boxed())
                .reduce(0L, Math::addExact));    // => 30
System.out.println(
        xf.apply(LongStream.range(0L, 100L).boxed())
                .collect(Collectors.toList()));  // => [2, 4, 6, 8, 10]

Apart from the difference in static/dynamic typing, these seem quite similar to me in purpose and usage.除了静态/动态类型的差异外，这些在目的和用法上似乎与我非常相似。

Is the analogy with transformations of Java streams a reasonable way of thinking about transducers?与 Java 流转换的类比是否是考虑转换器的合理方式？ If not, how is it flawed, or how do the two differ in concept (not to speak of implementation)?如果不是，它是如何有缺陷的，或者两者在概念上有何不同（更不用说实现了）？

Answer 1

The main difference is that the set of verbs (operations) is somehow closed for streams while it's open for transducers: try for example to implement partition on streams, it feels a bit second class:主要区别在于，动词（操作）集在某种程度上对流关闭，而对转换器开放：例如尝试在流上实现partition ，感觉有点二等：

import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.Stream.Builder;

public class StreamUtils {
    static <T> Stream<T> delay(final Supplier<Stream<T>> thunk) {
        return Stream.of((Object) null).flatMap(x -> thunk.get());
    }

    static class Partitioner<T> implements Function<T, Stream<Stream<T>>> {
        final Function<T, ?> f;

        Object prev;
        Builder<T> sb;

        public Partitioner(Function<T, ?> f) {
            this.f = f;
        }

        public Stream<Stream<T>> apply(T t) {
            Object tag = f.apply(t);
            if (sb != null && prev.equals(tag)) {
                sb.accept(t);
                return Stream.empty();
            }
            Stream<Stream<T>> partition = sb == null ? Stream.empty() : Stream.of(sb.build());
            sb = Stream.builder();
            sb.accept(t);
            prev = tag;
            return partition;
        }

        Stream<Stream<T>> flush() {
            return sb == null ? Stream.empty() : Stream.of(sb.build());
        }
    }

    static <T> Stream<Stream<T>> partitionBy(Stream<T> in, Function<T, ?> f) {
        Partitioner<T> partitioner = new Partitioner<>(f);
        return Stream.concat(in.flatMap(partitioner), delay(() -> partitioner.flush()));
    }
}

Also like sequences and reducers, when you transform you don't create a "bigger" computation, you create a "bigger" source.也像序列和减速器一样，当您转换时，您不会创建“更大”的计算，而是创建一个“更大”的源。

To be able to pass computations, you've introduced xf a function from Stream to Stream to lift operations from methods to first class entities (so as to untie them from the source).为了能够通过计算，你已经推出了xf从流至流的功能，以提升操作从方法到一流的实体（以解开他们从源头）。 By doing so you've created a transducer albeit with a too large interface.通过这样做，您已经创建了一个转换器，尽管接口太大。

Below is a more general version of the above code to apply any (clojure) transducer to a Stream:以下是将任何（clojure）转换器应用于流的上述代码的更通用版本：

import java.util.function.Function;
import java.util.function.Supplier;
import java.util.stream.Stream;
import java.util.stream.Stream.Builder;

import clojure.lang.AFn;
import clojure.lang.IFn;
import clojure.lang.Reduced;

public class StreamUtils {
    static <T> Stream<T> delay(final Supplier<Stream<T>> thunk) {
        return Stream.of((Object) null).flatMap(x -> thunk.get());
    }

    static class Transducer implements Function {
        IFn rf;

        public Transducer(IFn xf) {
            rf = (IFn) xf.invoke(new AFn() {
                public Object invoke(Object acc) {
                    return acc;
                }

                public Object invoke(Object acc, Object item) {
                    ((Builder<Object>) acc).accept(item);
                    return acc;
                }
            });
        }

        public Stream<?> apply(Object t) {
            if (rf == null) return Stream.empty();
            Object ret = rf.invoke(Stream.builder(), t);
            if (ret instanceof Reduced) {
                Reduced red = (Reduced) ret;
                Builder<?> sb = (Builder<?>) red.deref();
                return Stream.concat(sb.build(), flush());
            }
            return ((Builder<?>) ret).build();
        }

        Stream<?> flush() {
            if (rf == null) return Stream.empty();
            Builder<?> sb = (Builder<?>) rf.invoke(Stream.builder());
            rf = null;
            return sb.build();
        }
    }

    static <T> Stream<?> withTransducer(Stream<T> in, IFn xf) {
        Transducer transducer = new Transducer(xf);
        return Stream.concat(in.flatMap(transducer), delay(() -> transducer.flush()));
    }
}

Answer 2

Another important difference that I see is that Clojure Transducers are composable .我看到的另一个重要区别是 Clojure Transducers 是可组合的。 I often have the situation that my stream pipelines are a bit longer than in your example, where there are just some intermediate steps that I could re-use elsewhere, eg:我经常遇到这样的情况，我的流管道比您的示例中的要长一些，其中只有一些中间步骤可以在其他地方重用，例如：

someStream
   .map(...)
   .filter(...)
   .map(...)      // <- gee, there are at least two other
   .filter(...)   // <- pipelines where I could use the functionality
   .map(...)      // <- of just these three steps!
   .filter(...)
   .collect(...)

I haven't found a sane way to achieve that.我还没有找到一种理智的方法来实现这一目标。 What I wish I had was something like this:我希望我拥有的是这样的：

Transducer<Integer,String> smallTransducer = s -> s.map(...); // usable in a stream Integer -> String
Transducer<String,MyClass> otherTransducer = s -> s.filter(...).map(...); // stream String -> MyClass
Transducer<Integer,MyClass> combinedTransducer = smallTransducer.then(otherTransducer); // compose transducers, to get an Integer -> MyClass transducer

and then use it like this:然后像这样使用它：

someStream
   .map(...)
   .filter(...)
   .transduce(smallTransducer)
   .transduce(otherTransducer)
   .filter(...)
   .collect(...)

// or

someStream
   .map(...)
   .filter(...)
   .transduce(combinedTransducer)
   .filter(...)
   .collect(...)

Clojure 转换器是否与 Java 中的流中间操作具有相同的概念？

问题描述

2 个解决方案

解决方案1
10 2016-02-01 22:33:10

解决方案2
1 2019-10-30 12:17:01

Clojure 转换器是否与 Java 中的流中间操作具有相同的概念？

问题描述

2 个解决方案

解决方案1 10 2016-02-01 22:33:10

解决方案2 1 2019-10-30 12:17:01

解决方案1
10 2016-02-01 22:33:10

解决方案2
1 2019-10-30 12:17:01