是否可以在Java 8中執行懶惰的groupby，返回流？

Question

我有一些較大的文本文件，我希望通過對其行進行分組來進行處理。

我嘗試使用新的流媒體功能，例如

return FileUtils.readLines(...) 
            .parallelStream()
            .map(...)
            .collect(groupingBy(pair -> pair[0]));

問題是AFAIK會生成一個Map。

有沒有辦法像上面的代碼那樣生成高級代碼，例如生成條目流？

更新：我正在尋找的是類似python的itertools.groupby的東西。 我的文件已經排序（按pair [0]），我只想一個接一個地加載組。

我已經有一個迭代的解決方案。 我只是想知道是否還有一種更具聲明性的方式來做到這一點。 順便說一句，使用番石榴或其他第三方庫不是什么大問題。

Answer 1

您要完成的任務與分組的工作完全不同。 groupingBy不依賴於Stream元素的順序，而是依賴於應用於分類器Function的結果的Map的算法。

您想要的是將具有公共屬性值的相鄰項目折疊到一個“ List項目中。 只要您可以確保將具有相同屬性值的所有項目都聚在一起，就不必按該屬性對Stream進行排序。

也許可以將這個任務表述為簡化，但對我而言，最終的結構看起來太復雜了。

因此，除非在Stream添加了對此功能的直接支持，否則基於迭代器的方法對我而言似乎最為實用：

class Folding<T,G> implements Spliterator<Map.Entry<G,List<T>>> {
    static <T,G> Stream<Map.Entry<G,List<T>>> foldBy(
            Stream<? extends T> s, Function<? super T, ? extends G> f) {
        return StreamSupport.stream(new Folding<>(s.spliterator(), f), false);
    }
    private final Spliterator<? extends T> source;
    private final Function<? super T, ? extends G> pf;
    private final Consumer<T> c=this::addItem;
    private List<T> pending, result;
    private G pendingGroup, resultGroup;

    Folding(Spliterator<? extends T> s, Function<? super T, ? extends G> f) {
        source=s;
        pf=f;
    }
    private void addItem(T item) {
        G group=pf.apply(item);
        if(pending==null) pending=new ArrayList<>();
        else if(!pending.isEmpty()) {
            if(!Objects.equals(group, pendingGroup)) {
                if(pending.size()==1)
                    result=Collections.singletonList(pending.remove(0));
                else {
                    result=pending;
                    pending=new ArrayList<>();
                }
                resultGroup=pendingGroup;
            }
        }
        pendingGroup=group;
        pending.add(item);
    }
    public boolean tryAdvance(Consumer<? super Map.Entry<G, List<T>>> action) {
        while(source.tryAdvance(c)) {
            if(result!=null) {
                action.accept(entry(resultGroup, result));
                result=null;
                return true;
            }
        }
        if(pending!=null) {
            action.accept(entry(pendingGroup, pending));
            pending=null;
            return true;
        }
        return false;
    }
    private Map.Entry<G,List<T>> entry(G g, List<T> l) {
        return new AbstractMap.SimpleImmutableEntry<>(g, l);
    }
    public int characteristics() { return 0; }
    public long estimateSize() { return Long.MAX_VALUE; }
    public Spliterator<Map.Entry<G, List<T>>> trySplit() { return null; }
}

將折疊后的Stream應用於無限流可以最好地證明其懶惰性質：

Folding.foldBy(Stream.iterate(0, i->i+1), i->i>>4)
       .filter(e -> e.getKey()>5)
       .findFirst().ifPresent(e -> System.out.println(e.getValue()));

Answer 2

我收藏了一份關於cyclops-react的文章，它提供了分片和分組功能，可以滿足您的需求。

  ReactiveSeq<ListX<TYPE>> grouped = ReactiveSeq.fromCollection(FileUtils.readLines(...) )
             .groupedStatefullyWhile((batch,next) ->  batch.size()==0 ? true : next.equals(batch.get(0)));

groupedStatefullyWhile運算符允許根據批次的當前狀態對元素進行分組。 ReactiveSeq是單線程順序流。

  Map<Key, Stream<Value> sharded = 
                  new LazyReact()
                 .fromCollection(FileUtils.readLines(...) )
                 .map(..)
                 .shard(shards, pair -> pair[0]);

這將創建一個LazyFutureStream（實現java.util.stream.Stream），該異步和並行處理文件中的數據。 這是懶惰的，並且直到將數據通過才開始處理。

唯一的警告是您需要預先定義分片。 即上面的“ shards”參數是async.Queue的映射，該映射由分片的鍵（可能是pair [0]是？）來鍵控。

例如

Map<Integer,Queue<String>> shards;

這里有視頻和測試代碼的分片示例

Answer 3

可以通過StreamEx collapse來完成

final int[][] aa = { { 1, 1 }, { 1, 2 }, { 2, 2 }, { 2, 3 }, { 3, 3 }, { 4, 4 } };

StreamEx.of(aa)
        .collapse((a, b) -> a[0] == b[0], Collectors.groupingBy(a -> a[0]))
        .forEach(System.out::println);

我們可以添加peek和limit來驗證它是否是惰性計算：

StreamEx.of(aa)
        .peek(System.out::println)
        .collapse((a, b) -> a[0] == b[0], Collectors.groupingBy(a -> a[0]))
        .limit(1)
        .forEach(System.out::println);

是否可以在Java 8中執行懶惰的groupby，返回流？

問題描述

3 個解決方案

解決方案1
3 已采納 2014-09-04 10:40:39

解決方案2
1 2015-03-29 21:17:59

解決方案3
0 2017-06-12 19:52:29

是否可以在Java 8中執行懶惰的groupby，返回流？

問題描述

3 個解決方案

解決方案1 3 已采納 2014-09-04 10:40:39

解決方案2 1 2015-03-29 21:17:59

解決方案3 0 2017-06-12 19:52:29

解決方案1
3 已采納 2014-09-04 10:40:39

解決方案2
1 2015-03-29 21:17:59

解決方案3
0 2017-06-12 19:52:29