Java 8 One Stream到多個地圖

Question

可以說我有大量的網絡服務器日志文件，不適合內存。 我需要將此文件傳輸到mapreduce方法並保存到數據庫。 我這樣做是使用Java 8 stream api。 例如，我在mapreduce進程之后得到一個列表，例如，客戶端消費，ip消費，內容消費。 但是，我的需求並不像我的例子中那樣。 由於我不能共享代碼，我只想給出基本的例子。

通過Java 8 Stream Api，我想要一次讀取文件，同時獲取3個列表，而我是流式文件，並行或順序。 但並行會很好。 有沒有辦法做到這一點？

Answer 1

通常收集標准API之外的任何東西，通過自定義Collector非常容易。 在您的情況下，一次收集到3個列表（只是一個編譯的小例子，因為您也無法共享您的代碼）：

private static <T> Collector<T, ?, List<List<T>>> to3Lists() {
    class Acc {

        List<T> left = new ArrayList<>();

        List<T> middle = new ArrayList<>();

        List<T> right = new ArrayList<>();

        List<List<T>> list = Arrays.asList(left, middle, right);

        void add(T elem) {
            // obviously do whatever you want here
            left.add(elem);
            middle.add(elem);
            right.add(elem);
        }

        Acc merge(Acc other) {

            left.addAll(other.left);
            middle.addAll(other.middle);
            right.addAll(other.right);

            return this;
        }

        public List<List<T>> finisher() {
            return list;
        }

    }
    return Collector.of(Acc::new, Acc::add, Acc::merge, Acc::finisher);
}

並通過以下方式使用：

Stream.of(1, 2, 3)
      .collect(to3Lists());

顯然，這個自定義收集器沒有做任何有用的事情，只是一個如何使用它的例子。

Answer 2

我已經根據你的情況調整了這個問題的答案。 自定義Spliterator會將流“拆分”為多個按不同屬性收集的流：

@SafeVarargs
public static <T> long streamForked(Stream<T> source, Consumer<Stream<T>>... consumers)
{
    return StreamSupport.stream(new ForkingSpliterator<>(source, consumers), false).count();
}

public static class ForkingSpliterator<T>
    extends AbstractSpliterator<T>
{
    private Spliterator<T>         sourceSpliterator;

    private List<BlockingQueue<T>> queues = new ArrayList<>();

    private boolean                sourceDone;

    @SafeVarargs
    private ForkingSpliterator(Stream<T> source, Consumer<Stream<T>>... consumers)
    {
        super(Long.MAX_VALUE, 0);

        sourceSpliterator = source.spliterator();

        for (Consumer<Stream<T>> fork : consumers)
        {
            LinkedBlockingQueue<T> queue = new LinkedBlockingQueue<>();
            queues.add(queue);
            new Thread(() -> fork.accept(StreamSupport.stream(new ForkedConsumer(queue), false))).start();
        }
    }

    @Override
    public boolean tryAdvance(Consumer<? super T> action)
    {
        sourceDone = !sourceSpliterator.tryAdvance(t -> queues.forEach(queue -> queue.offer(t)));
        return !sourceDone;
    }

    private class ForkedConsumer
        extends AbstractSpliterator<T>
    {
        private BlockingQueue<T> queue;

        private ForkedConsumer(BlockingQueue<T> queue)
        {
            super(Long.MAX_VALUE, 0);
            this.queue = queue;
        }

        @Override
        public boolean tryAdvance(Consumer<? super T> action)
        {
            while (queue.peek() == null)
            {
                if (sourceDone)
                {
                    // element is null, and there won't be no more, so "terminate" this sub stream
                    return false;
                }
            }

            // push to consumer pipeline
            action.accept(queue.poll());

            return true;
        }
    }
}

您可以按如下方式使用它：

streamForked(Stream.of(new Row("content1", "client1", "location1", 1),
                       new Row("content2", "client1", "location1", 2),
                       new Row("content1", "client1", "location2", 3),
                       new Row("content2", "client2", "location2", 4),
                       new Row("content1", "client2", "location2", 5)),
             rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getClient,
                                                                           Collectors.groupingBy(Row::getContent,
                                                                                                 Collectors.summingInt(Row::getConsumption))))),
             rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getClient,
                                                                           Collectors.groupingBy(Row::getLocation,
                                                                                                 Collectors.summingInt(Row::getConsumption))))),
             rows -> System.out.println(rows.collect(Collectors.groupingBy(Row::getContent,
                                                                           Collectors.groupingBy(Row::getLocation,
                                                                                                 Collectors.summingInt(Row::getConsumption))))));

// Output
// {client2={location2=9}, client1={location1=3, location2=3}}
// {client2={content2=4, content1=5}, client1={content2=2, content1=4}}
// {content2={location1=2, location2=4}, content1={location1=1, location2=8}}

請注意，您可以使用流的副本執行任何您想要的操作。 根據您的示例，我使用堆疊的groupingBy收集器按兩個屬性對行進行groupingBy ，然后總結int屬性。 所以結果將是Map<String, Map<String, Integer>> 。 但您也可以將其用於其他場景：

rows -> System.out.println(rows.count())
rows -> rows.forEach(row -> System.out.println(row))
rows -> System.out.println(rows.anyMatch(row -> row.getConsumption() > 3))

Java 8 One Stream到多個地圖

問題描述

2 個解決方案

解決方案1
7 2018-08-13 08:58:28

解決方案2
4 已采納 2018-08-13 09:42:56

Java 8 One Stream到多個地圖

問題描述

2 個解決方案

解決方案1 7 2018-08-13 08:58:28

解決方案2 4 已采納 2018-08-13 09:42:56

解決方案1
7 2018-08-13 08:58:28

解決方案2
4 已采納 2018-08-13 09:42:56