Java，“掃描儀”的內存使用情況

Question

我正在運行一個在線自動程序評估平台，對於其中一個練習，Java“Scanner”正在使用太多的內存（我們剛剛開始支持Java，所以之前沒有出現問題）。 當我們向初學者教授算法時，我們不能僅僅要求他們通過讀取另一個字節后的一個字節來重新編碼。

根據我們的測試，掃描儀使用高達200字節讀取一個整數...

練習：10 000個整數，哪個100個連續整數的窗口有最大值？

內存使用量很小（你只需要記住最后100個整數）但是在帶有“Scanner / nextInt（）”的經典版本和手動版本（見下文）之間我們可以看到內存中2.5 Mb的差異。

2.5 Mb讀取10 000個整數==> 200字節讀取一個整數？

是否有任何簡單的解決方案可以向初學者解釋，或者是以下功能（或類似）？

我們的測試函數可以更快地讀取整數，同時使用更少的內存：

 public static int read_int() throws IOException { int number = 0; int signe = 1; int byteRead = System.in.read(); while (byteRead != '-' && ((byteRead < '0') || ('9' < byteRead))) byteRead = System.in.read(); if (byteRead == '-'){ signe = -1; byteRead = System.in.read(); } while (('0' <= byteRead) && (byteRead <= '9')){ number *= 10; number += byteRead - '0'; byteRead = System.in.read(); } return signe*number; }

根據要求使用掃描儀的代碼：

 import java.util.Scanner; class Main { public static void main(String[] args) { Scanner sc = new Scanner(System.in); int nbValues = sc.nextInt(); int widthWindow = sc.nextInt(); int values[] = new int[widthWindow]; int sumValues = 0; for (int idValue = 0; idValue < widthWindow; idValue++){ values[idValue] = sc.nextInt(); sumValues += values[idValue]; } int maximum = sumValues; for (int idValue = widthWindow; idValue < nbValues; idValue++) { sumValues -= values[ idValue % widthWindow ]; values[ idValue % widthWindow ] = sc.nextInt(); sumValues += values[ idValue % widthWindow ]; if (maximum < sumValues) maximum = sumValues; } System.out.println(maximum); } }

根據要求，內存用作整數數量的函數：

10,000：2.5Mb
20,000：5Mb
50,000：15Mb
100,000：30Mb
200,000：50Mb
300,000：75Mb

Answer 1

我們最終決定重寫（部分）Scanner類。 這樣我們只需要包含我們的掃描器而不是Java的掃描器，其余的代碼保持不變。 我們不再有任何內存問題，程序速度提高了20倍。

以下代碼來自我的同事ChristophDürr：

import java.io.BufferedInputStream;
import java.io.IOException;
import java.io.InputStream;

class Locale {
   final static int US=0;
}

public class Scanner {
   private BufferedInputStream in;

   int c;

   boolean atBeginningOfLine;

   public Scanner(InputStream stream) {
      in = new BufferedInputStream(stream);
      try {
         atBeginningOfLine = true;
         c  = (char)in.read();
      } catch (IOException e) {
         c  = -1;
      }
   }

   public boolean hasNext() {
      if (!atBeginningOfLine) 
         throw new Error("hasNext only works "+
         "after a call to nextLine");
      return c != -1;
   }

   public String next() {
      StringBuffer sb = new StringBuffer();
      atBeginningOfLine = false;
      try {
         while (c <= ' ') {
            c = in.read();
         } 
         while (c > ' ') {
            sb.append((char)c);
            c = in.read();
         }
      } catch (IOException e) {
         c = -1;
         return "";
      }
      return sb.toString();
   }

   public String nextLine() {
      StringBuffer sb = new StringBuffer();
      atBeginningOfLine = true;
      try {
         while (c != '\n') {
            sb.append((char)c);
            c = in.read();
         }
         c = in.read();
      } catch (IOException e) {
         c = -1;
         return "";
      }
      return sb.toString();   
   }

   public int nextInt() {
      String s = next();
      try {
         return Integer.parseInt(s);
      } catch (NumberFormatException e) {
         return 0; //throw new Error("Malformed number " + s);
      }
   }

   public double nextDouble() {
      return new Double(next());
   }

   public long nextLong() {
      return Long.parseLong(next());
   } 

   public void useLocale(int l) {}
}

通過在我的問題中集成代碼可以更快，我們通過閱讀caracter之后的“建立”數字。

Answer 2

我在調查我正在開發的Android應用程序中的嚴重內存膨脹時遇到了這個問題。

Android有一個記錄所有分配的工具。

事實證明，對於僅解析一個nextDouble（）調用，Java會進行128次分配。 前8位超過1000字節，最大的是4102字節（！）

不用說，這完全無法使用。 我們正在努力保持低電池電量，這確實無濟於事。

我將嘗試使用已發布的替換掃描程序代碼，謝謝！

這是證據：

4047    4102    char[]  13      java.lang.AbstractStringBuilder enlargeBuffer   
4045    3070    char[]  13      java.lang.String        <init>  
4085    2834    char[]  13      java.lang.AbstractStringBuilder enlargeBuffer   
4048    2738    char[]  13      java.lang.AbstractStringBuilder enlargeBuffer   
4099    1892    char[]  13      java.lang.AbstractStringBuilder enlargeBuffer   
4108    1264    char[]  13      java.lang.AbstractStringBuilder enlargeBuffer   
4118    1222    char[]  13      java.lang.AbstractStringBuilder enlargeBuffer   
4041    1128    int[]   13      java.util.regex.Matcher usePattern  
[...]

第二列是分配大小（可能以字節為單位，但Android設備監視器未指定）。

一句話：除非你有足夠的電量和CPU，否則不要使用掃描儀。

Answer 3

這是Scanner的nextInt（）代碼

    public int nextInt(int radix) {
    // Check cached result
    if ((typeCache != null) && (typeCache instanceof Integer)
    && this.radix == radix) {
        int val = ((Integer)typeCache).intValue();
        useTypeCache();
        return val;
    }
    setRadix(radix);
    clearCaches();
    // Search for next int
    try {
        String s = next(integerPattern());
        if (matcher.group(SIMPLE_GROUP_INDEX) == null)
            s = processIntegerToken(s);
        return Integer.parseInt(s, radix);
    } catch (NumberFormatException nfe) {
        position = matcher.start(); // don't skip bad token
        throw new InputMismatchException(nfe.getMessage());
    }
}

正如您所看到的，它是基數和符號識別，使用緩存等。因此額外的內存使用全部來自旨在提高掃描儀效率的功能。

Answer 4

您可以將所有值讀入數組，然后開始對數組求和。

在讀取數組時，您仍然需要那么多內存，但在閱讀之后，它可以免費用於其他目的。

你的代碼的結構將受益，imho，因為現在你可以為你的數字使用不同的源 - 例如util.Random，並仍然搜索數組中的最大總和，或搜索相同的數組以獲得不同的序列長度，而無需重讀輸入。

順便說一句：我很難讀取代碼，因為：

value / values / sumValues / nb_values - （為什么不是maximumValues）？ - 所有變量都是值，因此這無助於理解。
循環通常用i和j或n索引。 價值是誤導
length_sequence也有誤導性。 序列長度是指，但每個人都只使用'長度'，因為其他長度沒有歧義。
你使用長名稱作為瑣碎的東西，但對於一個不那么微不足道的東西來說，這是一個神秘的縮寫。 我讀了你的問題描述和代碼，不知道你的代碼是做什么的：nb_values你的意思是什么。 非阻塞？ 空字節？ 附近？ 它是什么？

我的第一印象是，對於一系列的Ints：

3 9 2 4 6 4 3 2 4 4 5 6 9 3 2 1 9 9 9

你會搜索一個長度為3到第9個值的序列（不計算3和9本身）並搜索最大值（2 + 4 + 6），（4 + 6 + 4），...（4 + 4） +5），但結果是34.您添加前9個值。

建議：

import java.util.Scanner;

class MaxChunk {

   int chunksize;

   public int[] readValues () {
      Scanner sc = new Scanner (System.in);
      chunksize = sc.nextInt ();
      int length = sc.nextInt ();
      int values[] = new int [length];
      for (int i = 0; i < length; i++)
      {
         values[i] = sc.nextInt();
      }   
      return values;
   }

   public int calc (int values[]) {
      int sum = 0;
      for (int i = 0; i < chunksize; i++)
      {
         sum += values[i];
      }

      int maximum = sum;

      for (int j = chunksize; j < values.length; j++)
      {
         sum -= values [j - chunksize];
         sum += values [j];
         if (maximum < sum)
             maximum = sum;
      }
      return maximum;  
   }

   public static void main (String[] args) {
      MaxChunk maxChunk = new MaxChunk ();
      int values[] = maxChunk.readValues ();
      System.out.println (maxChunk.calc (values));
   }
}

echo "3 9 2 4 6 4 3 2 4 4 5 6 9 3 2 1 9 9" | java MaxChunk

收益14。

Java，“掃描儀”的內存使用情況

問題描述

4 個解決方案

解決方案1
1 已采納 2012-01-14 11:44:21

解決方案2
0

解決方案3
0 2011-11-15 11:58:41

解決方案4
0 2011-11-29 14:40:30

Java，“掃描儀”的內存使用情況

問題描述

4 個解決方案

解決方案1 1 已采納 2012-01-14 11:44:21

解決方案2 0

解決方案3 0 2011-11-15 11:58:41

解決方案4 0 2011-11-29 14:40:30

解決方案1
1 已采納 2012-01-14 11:44:21

解決方案2
0

解決方案3
0 2011-11-15 11:58:41

解決方案4
0 2011-11-29 14:40:30