Java 返回一個 ArrayList 慢？

Question

我從一個對象返回一個數組列表並在另一個對象中使用它。 該應用程序是多線程的，每個線程一次從文件中填充一個 int 數組列表，因此每次添加都是對數組列表的一次獲取。 有 200 個線程，每個線程包含 100 萬個整數。 該應用程序需要幾個小時才能運行，我認為這是我的瓶頸，因為當我使用本地數組列表進行測試時，它需要 4 分鍾。 我的問題是，這無處不在，我需要在數組列表上進行同步。 是否有解決此問題的快速解決方案，或者我是否必須使每個線程都有自己的數組列表並且不返回它？

實際上我錯了，只有當數組是本地方法的時候，在任何地方都更快，比如在類的頂部聲明它需要幾個小時才能運行，我很難過。

我的返回代碼如下所示：

synchronized public ArrayList<Integer> getData() 
{
    return this.myData;
}

這是運行緩慢的地方，我刪除了其他東西，並試圖對此進行基准測試，這需要幾個小時：

    Scanner scanner = new Scanner(filePath);

    /*
     * While we have data keep reading
     * when out of data the simulation is complete.
     */
    while (scanner.hasNext()) 
    {
        /*
         * Get the data to simulate requests
         * and feed it to the algorithm being evaluated.
         */
        if (scanner.hasNextInt()) 
        {
            int temp = scanner.nextInt();
            //System.out.println( this.tClientName+" "+temp);


            /*
             * Add the temp value from incoming stream. 
             * 
             * todo:: UNLESS its NOT found on the client as a miss
             */
            tClientCache.getCache().add(temp); 

        } 
        else 
        {
            scanner.next();
        }
    }//END Of while (scanner.hasNext()) 
    /*
     * Close the scanner
     */
    scanner.close();

Answer 1

問題幾乎肯定不是返回ArrayList行為，因為那只是返回引用。

最可能的情況是同步開銷，因為對該方法的每次調用都需要獲取鎖，獲取數據，然后釋放鎖（有一些警告，但這基本上是正確的）。

此外，幾乎可以肯定，同步甚至不會做您想要它做的事情，因為對ArrayList的實際訪問需要同步，而不僅僅是獲取對它的引用的行為。

一般來說，你有兩種選擇：

減少同步點的數量（即不經常同步）或
選擇更高效的同步機制。

您的線程是否可以收集許多結果並將它們批量放入（一次說一千個）？ 或者您可以切換到更具有多線程功能的數據結構（想到CopyOnWriteArrayList ，但它針對頻繁讀取和非常不頻繁的寫入進行了優化，因此可能不適用於您的用例）。

Answer 2

如果您的並發函數如下所示：

Scanner scanner = new Scanner(filePath);

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        tClientCache.getCache().add(temp);
    } else {
        scanner.next();
    }
}

scanner.close();

您可以使用公共同步對象進行同步：

Scanner scanner = new Scanner(filePath);
Object syncObject = tClientCache.getSynchronizationObject();
ArrayList<Integer> list = tClientCache.getCache();

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        // synchronise manipulation
        synchronized(syncObject) {
            list.add(temp);
        }
    } else {
        scanner.next();
    }
}

scanner.close();

並通過以下方式擴展您的CacheClient ：

class CacheClient {
     ...
     public Object getSynchronizationObject() { return m_syncObj; }
     ...
     private Object m_syncObj = new Object(); // For synchronised access to the cache.
}

當然，在添加到列表時，您也必須同步對緩存的所有其他訪問。 考慮以這種方式重寫您的程序，即每個文件的輸出都是獨立處理的，因此每個文件的輸出都在自己的（未同步的）列表中，或者 - 在需要合並數據的情況下 - 批量處理數據：

Scanner scanner = new Scanner(filePath);
int threshold = ...

while(scanner.hasNext()) {
    if(scanner.hasNextInt()) {
        int temp = scanner.nextInt();
        bulk.add(temp);
        // instead of an arbitrary threshold, why not merge the array of a whole file?
        if(bulk.size() >= threshold) {
            tClientCache.process(bulk);
            bulk.clear();
        }
    } else {
        scanner.next();
    }
}
if(!bulk.isEmpty()) {
    tClientCache.process(bulk);
}

scanner.close();

並在ClientCache.process執行同步：

class ClientCache {
    ...
    public void process(ArrayList<Integer> bulk) {
        // synchronise cache manipulation
        synchronized(getSynchronizationObject()) {
            // merge howsoever you like...
            getCache().addAll(bulk);
        }
    }
}

200 Mio int對於當前系統（<1GB）來說並不是很多數據，但是 200 Mio Integer 大約是3 GB ！ 根據您對這些數據進行的處理類型，內存訪問可能會完全破壞您的性能：再次，在可能的情況下執行批量數據處理，如果您需要執行排序等高性能操作，請考慮將大量數據復制到固定大小的int[] ，對基本類型數組執行排序，然后將這些批量再次合並回數組。

Java 返回一個 ArrayList 慢？

問題描述

2 個解決方案

解決方案1
0 2015-11-05 16:58:29

解決方案2
0 已采納 2015-11-09 15:47:54

Java 返回一個 ArrayList 慢？

問題描述

2 個解決方案

解決方案1 0 2015-11-05 16:58:29

解決方案2 0 已采納 2015-11-09 15:47:54

解決方案1
0 2015-11-05 16:58:29

解決方案2
0 已采納 2015-11-09 15:47:54