简体   繁体   English

Java:创建List的块以进行处理

[英]Java : Creating chunks of List for processing

I have a list with a large number of elements. 我有一个包含大量元素的列表。 While processing this list, in some cases I want the list to be partitioned into smaller sub-lists and in some cases I want to process the entire list. 处理此列表时,在某些情况下,我希望将列表分区为较小的子列表,在某些情况下,我希望处理整个列表。

private void processList(List<X> entireList, int partitionSize)
{
    Iterator<X> entireListIterator = entireList.iterator();
    Iterator<List<X>> chunkOfEntireList =   Iterators.partition(entireListIterator, partitionSize);
    while (chunkOfEntireList.hasNext()) {
        doSomething(chunkOfEntireList.next());
        if (chunkOfEntireList.hasNext()) {
            doSomethingOnlyIfTheresMore();
        }
    }

I'm using com.google.common.collect.Iterators for creating partitions. 我正在使用com.google.common.collect.Iterators来创建分区。 Link of documentation here So in cases where I want to partition the list with size 100, I call 这里的文档链接因此,在我想要将大小为100的列表分区的情况下,我打电话给

processList(entireList, 100);

Now, when I don't want to create chunks of the list, I thought I could pass Integer.MAX_VALUE as partitionSize. 现在,当我不想创建列表的块时,我想我可以将Integer.MAX_VALUE作为partitionSize传递。

processList(entireList, Integer.MAX_VALUE);

But this leads to my code going out of memory. 但这会导致我的代码内存不足。 Can someone help me out? 有人可以帮我吗? What am I missing? 我错过了什么? What is Iterators doing internally and how do I overcome this? 什么是迭代器在内部做什么,我该如何克服这个问题?

EDIT : I also require the "if" clause inside to do something only if there are more lists to process. 编辑:我还要求内部的“if”子句只有在需要处理更多列表时才能执行某些操作。 ie i require hasNext() function of the iterator. 即我需要迭代器的hasNext()函数。

You're getting an out of memory error because Iterators.partition() internally populates an array with the given partition length. 您将收到内存不足错误,因为Iterators.partition()内部使用给定的分区长度填充数组。 The allocated array is always the partition size because the actual number of elements is not known until the iteration is complete. 分配的数组始终是分区大小,因为在迭代完成之前,不知道实际的元素数。 (The issue could have been prevented if they had used an ArrayList internally; I guess the designers decided that arrays would offer better performance in the common case.) (如果他们在内部使用了ArrayList那么这个问题本来可以避免;我想设计人员认为数组在常见情况下会提供更好的性能。)

Using Lists.partition() will avoid the problem since it delegates to List.subList() , which is only a view of the underlying list: 使用Lists.partition()将避免此问题,因为它委托给List.subList() ,它只是基础列表的视图

private void processList(List<X> entireList, int partitionSize) {
    for (List<X> chunk : Lists.partition(entireList, partitionSize)) {
        doSomething(chunk);
    }
}

Normally while partitioning it will allocate a new list with given partitionSize. 通常在分区时,它将使用给定的partitionSize分配新列表。 So it is obvious in this case that there will be such error. 所以在这种情况下很明显会出现这样的错误。 Why don't you use the original list when you need only single partition. 当您只需要单个分区时,为什么不使用原始列表。 Possible solutions. 可能的解决方案。

  1. create a separate overloaded method where you won't take the size. 创建一个单独的重载方法,您将不会采用该大小。
  2. pass the size as -1 when you don't need any partition. 当您不需要任何分区时,将大小传递为-1。 In the method check the value, if -1 then put the original list into the chunkOfEntireList ,. 在方法中检查值,如果为-1,则将原始列表放入chunkOfEntireList ,.

Assuming that you are trying to solve parallelism by processing chunks of your list in parallel, it might be better to consider something like MapReduce, or Spark as a bigger framework that includes process management. 假设您试图通过并行处理列表的块来解决并行问题,那么最好将MapReduce或Spark视为包含流程管理的更大框架。

However as part of a monolithic application you can consider node-local variants of it - including maybe the Java 8 Streams . 但是,作为单片应用程序的一部分,您可以考虑它的节点本地变体 - 包括Java 8 Streams Take note of the parallelStream() method that is also available on your List<X> . 记下List<X>上也可用的parallelStream()方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM