堆大小問題-使用Java進行內存管理

Question

我的應用程序中包含以下代碼，可完成兩件事：

解析具有n個數據的文件。

對於文件中的每個數據，將有兩個Web服務調用。

 public static List<String> parseFile(String fileName) {
   List<String> idList = new ArrayList<String>();
   try {
     BufferedReader cfgFile = new BufferedReader(new FileReader(new File(fileName)));
     String line = null;
     cfgFile.readLine();
     while ((line = cfgFile.readLine()) != null) {
       if (!line.trim().equals("")) {
         String [] fields = line.split("\\|"); 
         idList.add(fields[0]);
       } 
     } 
     cfgFile.close();
   } catch (IOException e) {
     System.out.println(e+" Unexpected File IO Error.");
   }
 return idList;
}

當我嘗試解析具有一百萬行記錄的文件時，在處理了一定數量的數據后，java進程將失敗。 我收到了java.lang.OutOfMemoryError: Java heap space錯誤。 我可以部分弄清楚，由於提供了大量數據，因此Java進程停止了。 請給我建議如何處理這些龐大的數據。

編輯：這部分代碼是否會new BufferedReader(new FileReader(new File(fileName))); 解析整個文件，並影響到文件的大小。

Answer 1

您遇到的問題是您正在累積列表中的所有數據。 解決此問題的最佳方法是以流方式進行。 這意味着不要累積列表中的所有ID，而是在每一行上調用您的Web服務或累積較小的緩沖區，然后進行調用。

由於將逐行讀取（或多或少）文件中的字節，因此打開文件並創建BufferedReader不會影響內存消耗。 現在的問題是代碼idList.add(fields[0]); ，隨着您不斷將所有文件數據累積到其中，列表將與文件一樣大。

您的代碼應執行以下操作：

 while ((line = cfgFile.readLine()) != null) {
   if (!line.trim().equals("")) {
     String [] fields = line.split("\\|"); 
     callToRemoteWebService(fields[0]);
   } 
 }

Answer 2

使用-Xms和-Xmx選項增加您的Java堆內存大小。 如果未明確設置，則jvm會將堆大小設置為符合人體工程學的默認值，這在您的情況下還不夠。 閱讀本文以了解有關在jvm中調整內存的更多信息： http : //www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf

編輯：以生產者-消費者方式執行此操作的另一種方式來利用並行處理。 通常的想法是創建一個生產者線程來讀取文件，並將任務排隊處理，並使用n個使用它們的使用者線程。 一個非常籠統的想法（出於說明目的）如下：

// blocking queue holding the tasks to be executed
final SynchronousQueue<Callable<String[]> queue = // ...

// reads the file and submit tasks for processing
final Runnable producer = new Runnable() {
  public void run() {
     BufferedReader in = null;
     try {
         in = new BufferedReader(new FileReader(new File(fileName)));
         String line = null;
         while ((line = file.readLine()) != null) {
             if (!line.trim().equals("")) {
                 String[] fields = line.split("\\|"); 
                 // this will block if there are not available consumer threads to process it...
                 queue.put(new Callable<Void>() {
                     public Void call() {
                         process(fields);
                     }
                  });
              } 
          }
     } catch (InterruptedException e) {
         Thread.currentThread().interrupt());
     } finally {
         // close the buffered reader here...
     }
  }
}

// Consumes the tasks submitted from the producer. Consumers can be pooled
// for parallel processing.
final Runnable consumer = new Runnable() {
  public void run() {
    try {
        while (true) {
            // this method blocks if there are no items left for processing in the queue...
            Callable<Void> task = queue.take();
            taks.call();
        }
    } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
    }
  }
}

當然，您必須編寫代碼來管理使用者和生產者線程的生命周期。 正確的方法是使用執行程序來實現。

Answer 3

當您要使用大數據時，有兩種選擇：

使用足夠大的堆來容納所有數據。 這將“工作”一段時間，但是如果您的數據大小不受限制，它將最終失敗。
逐步處理數據。 一次只能將部分數據（有限大小）保留在內存中。 這是理想的解決方案，因為它可以擴展到任意數量的數據。

堆大小問題-使用Java進行內存管理

問題描述

3 個解決方案

解決方案1
3 已采納 2012-09-28 14:26:47

解決方案2
2 2012-09-28 14:36:14

解決方案3
1 2012-09-28 14:26:32

堆大小問題-使用Java進行內存管理

問題描述

3 個解決方案

解決方案1 3 已采納 2012-09-28 14:26:47

解決方案2 2 2012-09-28 14:36:14

解決方案3 1 2012-09-28 14:26:32

解決方案1
3 已采納 2012-09-28 14:26:47

解決方案2
2 2012-09-28 14:36:14

解決方案3
1 2012-09-28 14:26:32