简体   繁体   中英

Java returning an ArrayList Slow?

I am returning an array list form an object and using it in another. The application is multi-threaded and each thread fills the array list one int at a time from a file, so each add is a get to the array list. There are 200 threads with a file of 1million ints each. The application takes hours to run and I assume this is my bottle neck, since when I test with a local array list it takes 4 minutes. My problem is, this is used everywhere and I need to synchronize on the array list. Is there a fast solution to this problem or do I have to make it so each thread has its own array list and don't return it?

Actually I was wrong, its only when the array is local to the method that is faster anywhere like declared at top of class it takes hours to run, I'm stumped on this.

My return code looks like:

synchronized public ArrayList<Integer> getData() 
{
    return this.myData;
}

Here is what runs slow, I removed other things and am trying to bench mark on this and this takes hours:

    Scanner scanner = new Scanner(filePath);

    /*
     * While we have data keep reading
     * when out of data the simulation is complete.
     */
    while (scanner.hasNext()) 
    {
        /*
         * Get the data to simulate requests
         * and feed it to the algorithm being evaluated.
         */
        if (scanner.hasNextInt()) 
        {
            int temp = scanner.nextInt();
            //System.out.println( this.tClientName+" "+temp);


            /*
             * Add the temp value from incoming stream. 
             * 
             * todo:: UNLESS its NOT found on the client as a miss
             */
            tClientCache.getCache().add(temp); 

        } 
        else 
        {
            scanner.next();
        }
    }//END Of while (scanner.hasNext()) 
    /*
     * Close the scanner
     */
    scanner.close();

The issue is almost certainly not act of returning the ArrayList , as that's just returning a reference.

The most likely case is the synchronization overhead, as every call to that method needs to acquire a lock, get the data, then release the lock (with some caveats, but that's basically true).

Additionally that synchronization almost certainly doesn't even do what you want it to do, as the actual access to the ArrayList needs to be synchronized and not just the act of getting a reference to it.

Generally speaking you've got two options:

  • reduce the number of synchronization points (ie synchronize less often) or
  • choose a more efficient synchronization mechanism.

Can your threads maybe collect a number of results and put them in in bulk (say a thousand at a time)? Or can you switch to a more multi-threading-capable data structure ( CopyOnWriteArrayList comes to mind, but that's optimized for frequent reading and very infrequent writing, so probably not for your use case).

If your concurrent function looks like this:

Scanner scanner = new Scanner(filePath);

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        tClientCache.getCache().add(temp);
    } else {
        scanner.next();
    }
}

scanner.close();

You can synchronise by using a common synchronisation object:

Scanner scanner = new Scanner(filePath);
Object syncObject = tClientCache.getSynchronizationObject();
ArrayList<Integer> list = tClientCache.getCache();

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        // synchronise manipulation
        synchronized(syncObject) {
            list.add(temp);
        }
    } else {
        scanner.next();
    }
}

scanner.close();

and extend your CacheClient by the following:

class CacheClient {
     ...
     public Object getSynchronizationObject() { return m_syncObj; }
     ...
     private Object m_syncObj = new Object(); // For synchronised access to the cache.
}

Of course like that you'll have to synchronise all other access to the cache too, while you're adding to the list. Consider rewriting your programme in such a way, that either the output of each file is processed independently and thus each in their own (unsynchronised) list, or - in case where you need to merge the data - you process the data in bulks:

Scanner scanner = new Scanner(filePath);
int threshold = ...

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        bulk.add(temp);
        // instead of an arbitrary threshold, why not merge the array of a whole file?
        if(bulk.size() >= threshold) {
            tClientCache.process(bulk);
            bulk.clear();
        }
    } else {
        scanner.next();
    }
}
if(!bulk.isEmpty()) {
    tClientCache.process(bulk);
}

scanner.close();

and perform the synchronisation in ClientCache.process :

class ClientCache {
    ...
    public void process(ArrayList<Integer> bulk) {
        // synchronise cache manipulation
        synchronized(getSynchronizationObject()) {
            // merge howsoever you like...
            getCache().addAll(bulk);
        }
    }
}

200 Mio int is not a lot of data for current systems (<1GB), but 200 Mio Integer is about 3 GB ! Depending on what kind of processing you do on this data, the memory access might completely destroy your performance: again, perform bulk-data processing where possible, and if you need to do high-performance stuff like sorting, consider copying bulks of data into fixed sized int[] , perform your sorting on the basic type array, then merge those bulks again back into your arrays.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM