简体   繁体   English

Java 返回一个 ArrayList 慢?

[英]Java returning an ArrayList Slow?

I am returning an array list form an object and using it in another.我从一个对象返回一个数组列表并在另一个对象中使用它。 The application is multi-threaded and each thread fills the array list one int at a time from a file, so each add is a get to the array list.该应用程序是多线程的,每个线程一次从文件中填充一个 int 数组列表,因此每次添加都是对数组列表的一次获取。 There are 200 threads with a file of 1million ints each.有 200 个线程,每个线程包含 100 万个整数。 The application takes hours to run and I assume this is my bottle neck, since when I test with a local array list it takes 4 minutes.该应用程序需要几个小时才能运行,我认为这是我的瓶颈,因为当我使用本地数组列表进行测试时,它需要 4 分钟。 My problem is, this is used everywhere and I need to synchronize on the array list.我的问题是,这无处不在,我需要在数组列表上进行同步。 Is there a fast solution to this problem or do I have to make it so each thread has its own array list and don't return it?是否有解决此问题的快速解决方案,或者我是否必须使每个线程都有自己的数组列表并且不返回它?

Actually I was wrong, its only when the array is local to the method that is faster anywhere like declared at top of class it takes hours to run, I'm stumped on this.实际上我错了,只有当数组是本地方法的时候,在任何地方都更快,比如在类的顶部声明它需要几个小时才能运行,我很难过。

My return code looks like:我的返回代码如下所示:

synchronized public ArrayList<Integer> getData() 
{
    return this.myData;
}

Here is what runs slow, I removed other things and am trying to bench mark on this and this takes hours:这是运行缓慢的地方,我删除了其他东西,并试图对此进行基准测试,这需要几个小时:

    Scanner scanner = new Scanner(filePath);

    /*
     * While we have data keep reading
     * when out of data the simulation is complete.
     */
    while (scanner.hasNext()) 
    {
        /*
         * Get the data to simulate requests
         * and feed it to the algorithm being evaluated.
         */
        if (scanner.hasNextInt()) 
        {
            int temp = scanner.nextInt();
            //System.out.println( this.tClientName+" "+temp);


            /*
             * Add the temp value from incoming stream. 
             * 
             * todo:: UNLESS its NOT found on the client as a miss
             */
            tClientCache.getCache().add(temp); 

        } 
        else 
        {
            scanner.next();
        }
    }//END Of while (scanner.hasNext()) 
    /*
     * Close the scanner
     */
    scanner.close();

The issue is almost certainly not act of returning the ArrayList , as that's just returning a reference.问题几乎肯定不是返回ArrayList行为,因为那只是返回引用。

The most likely case is the synchronization overhead, as every call to that method needs to acquire a lock, get the data, then release the lock (with some caveats, but that's basically true).最可能的情况是同步开销,因为对该方法的每次调用都需要获取锁,获取数据,然后释放锁(有一些警告,但这基本上是正确的)。

Additionally that synchronization almost certainly doesn't even do what you want it to do, as the actual access to the ArrayList needs to be synchronized and not just the act of getting a reference to it.此外,几乎可以肯定,同步甚至不会做您想要它做的事情,因为对ArrayList的实际访问需要同步,而不仅仅是获取对它的引用的行为。

Generally speaking you've got two options:一般来说,你有两种选择:

  • reduce the number of synchronization points (ie synchronize less often) or减少同步点的数量(即不经常同步)
  • choose a more efficient synchronization mechanism.选择更高效的同步机制。

Can your threads maybe collect a number of results and put them in in bulk (say a thousand at a time)?您的线程是否可以收集许多结果并将它们批量放入(一次说一千个)? Or can you switch to a more multi-threading-capable data structure ( CopyOnWriteArrayList comes to mind, but that's optimized for frequent reading and very infrequent writing, so probably not for your use case).或者您可以切换到更具有多线程功能的数据结构(想到CopyOnWriteArrayList ,但它针对频繁读取和非常不频繁的写入进行了优化,因此可能不适用于您的用例)。

If your concurrent function looks like this:如果您的并发函数如下所示:

Scanner scanner = new Scanner(filePath);

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        tClientCache.getCache().add(temp);
    } else {
        scanner.next();
    }
}

scanner.close();

You can synchronise by using a common synchronisation object:您可以使用公共同步对象进行同步:

Scanner scanner = new Scanner(filePath);
Object syncObject = tClientCache.getSynchronizationObject();
ArrayList<Integer> list = tClientCache.getCache();

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        // synchronise manipulation
        synchronized(syncObject) {
            list.add(temp);
        }
    } else {
        scanner.next();
    }
}

scanner.close();

and extend your CacheClient by the following:并通过以下方式扩展您的CacheClient

class CacheClient {
     ...
     public Object getSynchronizationObject() { return m_syncObj; }
     ...
     private Object m_syncObj = new Object(); // For synchronised access to the cache.
}

Of course like that you'll have to synchronise all other access to the cache too, while you're adding to the list.当然,在添加到列表时,您也必须同步对缓存的所有其他访问。 Consider rewriting your programme in such a way, that either the output of each file is processed independently and thus each in their own (unsynchronised) list, or - in case where you need to merge the data - you process the data in bulks:考虑以这种方式重写您的程序,即每个文件的输出都是独立处理的,因此每个文件的输出都在自己的(未同步的)列表中,或者 - 在需要合并数据的情况下 - 批量处理数据:

Scanner scanner = new Scanner(filePath);
int threshold = ...

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        bulk.add(temp);
        // instead of an arbitrary threshold, why not merge the array of a whole file?
        if(bulk.size() >= threshold) {
            tClientCache.process(bulk);
            bulk.clear();
        }
    } else {
        scanner.next();
    }
}
if(!bulk.isEmpty()) {
    tClientCache.process(bulk);
}

scanner.close();

and perform the synchronisation in ClientCache.process :并在ClientCache.process执行同步:

class ClientCache {
    ...
    public void process(ArrayList<Integer> bulk) {
        // synchronise cache manipulation
        synchronized(getSynchronizationObject()) {
            // merge howsoever you like...
            getCache().addAll(bulk);
        }
    }
}

200 Mio int is not a lot of data for current systems (<1GB), but 200 Mio Integer is about 3 GB ! 200 Mio int对于当前系统(<1GB)来说并不是很多数据,但是 200 Mio Integer 大约3 GB Depending on what kind of processing you do on this data, the memory access might completely destroy your performance: again, perform bulk-data processing where possible, and if you need to do high-performance stuff like sorting, consider copying bulks of data into fixed sized int[] , perform your sorting on the basic type array, then merge those bulks again back into your arrays.根据您对这些数据进行的处理类型,内存访问可能会完全破坏您的性能:再次,在可能的情况下执行批量数据处理,如果您需要执行排序等高性能操作,请考虑将大量数据复制到固定大小的int[] ,对基本类型数组执行排序,然后将这些批量再次合并回数组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM