简体   繁体   English

Java:获取输入整数数组的最有效方法

[英]Java: Most Efficient Way to Get Input Integer Array

I'm working on a problem that requires me to store a very large amount of integers into an integer array. 我正在解决一个问题,该问题要求我将大量整数存储到整数数组中。 The input is formatted so that one line displays the amount of integers and the next displays all of the values meant to be stored. 输入的格式设置为,使得一行显示整数的数量,而下一行显示所有要存储的值。 Ex: 例如:

3
12 45 67

In the problem there is closer to 100,000 integers to be stored. 在该问题中,有将近100,000个整数要存储。 Currently I am using this method of storing the integers: 目前,我正在使用这种存储整数的方法:

Scanner scanner = new Scanner(System.in);
int n = scanner.nextInt();

int[] iVau = new int[n];

String[] temp = scanner.nextLine().split(" ");

for(int i = 0; i < n; i++) {
    iVau[i] = Integer.parseInt(temp[i]);
}

This works fine, however the problem I am solving has a strict time limit and my current solution is exceeding it. 这可以正常工作,但是我要解决的问题有严格的时间限制,而我目前的解决方案超出了它。 I know that there is a more efficient way to store this input using buffered readers and input streams, but I don't know how to do it, can someone please show me. 我知道有一种使用缓冲的读取器和输入流来存储此输入的更有效的方法,但是我不知道该怎么做,有人可以告诉我。

The way you are using Scanner makes your program save a String containing the whole numbers at once, in memory. 使用Scanner的方式使程序可以一次在内存中保存一个包含整数的字符串。 With 100000 numbers in the 2nd line of your input, it is not so efficient, you could read numbers one after the other without keeping the previous one in memory. 输入的第二行有100000个数字,效率不高,您可以一个接一个地读取数字,而不必将前一个数字保留在内存中。 So, this way, avoiding using Scanner.readLine() should make your program run faster. 因此,以这种方式,避免使用Scanner.readLine()应该会使您的程序运行更快。 You will not have to read the whole line one time, and read a 2nd time this String to parse the integers from it: you will do both of these operations only once. 您将不必一次读取整行,而无需第二遍读取此String即可从中解析整数:您只需执行一次这两项操作即可。

Here is an example. 这是一个例子。 The method testing() does not use any Scanner. 方法testing()不使用任何扫描仪。 The method testing2() is the one you provided. 您提供的是testing2()方法。 The file tst.txt contains 100000 numbers. 文件tst.txt包含100000个数字。 The output from this program, on my Mac Mini (Intel Core i5@2.6GHz) is: 在我的Mac Mini(Intel Core i5@2.6GHz)上,该程序的输出为:

duration without reading one line at a time, without using a Scanner instance: 140 ms
duration when reading one line at a time with a Scanner instance: 198 ms

As you can see, not using Scanner makes your program 41% faster (integer part of (198-140)/140*100 equals 41). 如您所见,不使用Scanner会使程序速度提高41% ((198-140)/ 140 * 100的整数部分等于41)。

package test1;
import java.io.*;
import java.util.*;

public class Test {
    // Read and parse an Int from the stream: 2 operations at once
    private static int readInt(InputStreamReader ir) throws IOException {
        StringBuffer str = new StringBuffer();
        int c;
        do { c = ir.read(); } while (c < '0' || c > '9');
        do {
            str.append(Character.toString((char) c));
            c = ir.read();
        } while (!(c < '0' || c > '9'));
        return Integer.parseInt(str.toString());
    }

    // Parsing the input step by step
    private static void testing(File f) throws IOException {
        InputStreamReader ir = new InputStreamReader(new BufferedInputStream(new FileInputStream(f)));
        int n = readInt(ir);
        int [] iVau = new int[n];
        for (int i = 0; i < n; i++) iVau[i] = readInt(ir);
        ir.close();
    }

    // Your code
    private static void testing2(File f) throws IOException {
        Scanner scanner = new Scanner(f);
        int n = scanner.nextInt();
        int[] iVau = new int[n];
        scanner.nextLine();     
        String[] temp = scanner.nextLine().split(" ");
        for(int i = 0; i < n; i++)
            iVau[i] = Integer.parseInt(temp[i]);
        scanner.close();
    }

    // Compare durations
    public static void main(String[] args) throws IOException {
        File f = new File("/tmp/tst.txt");          

        // My proposal    
        long t = System.currentTimeMillis();
        testing(f);
        System.out.println("duration without reading one line at a time, without using a Scanner instance: " + (System.currentTimeMillis() - t) + " ms");       

        // Your code    
        t = System.currentTimeMillis();
        testing2(f);
        System.out.println("duration when reading one line at a time with a Scanner instance: " + (System.currentTimeMillis() - t) + " ms");
    }
}

NOTE: creating the input file is done this way, with bash or zsh: 注意:使用bash或zsh可以通过这种方式创建输入文件:

echo 100000 > /tmp/tst.txt
for i in {1..100000}
do
  echo -n $i" " >> /tmp/tst.txt
done

I believe this is what you're looking for. 我相信这就是您要寻找的。 A BufferedReader can only read a line at a time, so it is necessary to split the line and cast String s to int s. BufferedReader一次只能读取一行,因此有必要将行拆分并将Stringint

BufferedReader br = new BufferedReader(new InputStreamReader(System.in));

try {
    int n = Integer.parseInt(br.readLine());
    int[] arr = new int[n];

    String[] line = br.readLine().split(" ");
    for (int i = 0; i < n; i++) {
        arr[i] = Integer.parseInt(line[i]);
    }
} catch (IOException e) {
    e.getStackTrace();
}

Just a thought, String.split returns an array of Strings. 只是想一想,String.split返回一个字符串数组。 You say the input can be around 100,000 values. 您说输入的值可能约为100,000。 So in order to split the array in this way, String.split must be iterating through each element. 因此,为了以这种方式拆分数组,String.split必须遍历每个元素。 Now in parsing the new array of strings to Integers you have iterated through the collection twice. 现在,在将新的字符串数组解析为Integers时,已对集合进行了两次迭代。 You could do this in one iteration with a few small tweaks. 您可以通过一些小的调整就可以一次迭代完成此操作。

Scanner scanner = new Scanner(System.in);
String tmp = scanner.nextLine();
scanner = new Scanner(tmp); 

for(int i = 0; scanner.hasNextInt(); i++) {
  arr[i] = scanner.nextInt();
}

The reason for linking the scanner to a String instead of leaving it on System.in is so that it ends properly. 将扫描程序链接到字符串而不是将其保留在System.in上的原因是,它可以正确结束。 It doesn't open System.in for user input on the last token. 它不会打开System.in来供用户输入最后一个令牌。 I believe in big O notation this is the difference between O(n) and O(2n) where the original snippet is O(2n) 我相信大O表示法是原始片段为O(2n)的O(n)和O(2n)之间的区别

I am not quite sure why OP has to use Integer.parseInt(s) here since Scanner can just do the parsing directly by new Scanner(File source) . 我不太确定为什么OP必须在这里使用Integer.parseInt(s) ,因为Scanner可以直接通过new Scanner(File source)进行解析。

Here is a demo/test for this idea: 这是此想法的演示/测试:

public class NextInt {
    public static void main(String... args) {
        prepareInputFile(1000, 500); // create 1_000 arrays which each contains 500 numbers;
        Timer.timer(() -> readFromFile(), 20, "NextInt"); // read from the file 20 times using Scanner.nextInt();
        Timer.timer(() -> readTest(), 20, "Split"); // read from the file 20 times using split() and Integer.parseInt();
    }

    private static void readTest() {
        Path inputPath = Paths.get(Paths.get("").toAbsolutePath().toString().concat("/src/main/java/io/input.txt"));
        try (Scanner scanner = new Scanner(new File(inputPath.toString()))) {
            int n = Integer.valueOf(scanner.nextLine());
            int[] iVau = new int[n];
            String[] temp = scanner.nextLine().split(" ");
            for (int i = 0; i < n; i++) {
                iVau[i] = Integer.parseInt(temp[i]);
            }
        } catch (IOException ignored) {
            ignored.printStackTrace();
        }
    }

    private static void readFromFile() {
        Path inputPath = Paths.get(Paths.get("").toAbsolutePath().toString().concat("/src/main/java/io/input.txt"));
        try (Scanner scanner = new Scanner(new File(inputPath.toString()))) {
            while (scanner.hasNextInt()) {
                int arrSize = scanner.nextInt();
                int[] arr = new int[arrSize];
                for (int i = 0; i < arrSize; ++i) {
                    arr[i] = scanner.nextInt();
                }
//                System.out.println(Arrays.toString(arr));
            }
        } catch (IOException ignored) {
            ignored.printStackTrace();
        }
    }

    private static void prepareInputFile(int arrCount, int arrSize) {
        Path outputPath = Paths.get(Paths.get("").toAbsolutePath().toString().concat("/src/main/java/io/input.txt"));
        List<String> lines = new ArrayList<>();
        for (int i = 0; i < arrCount; ++i) {
            int[] arr = new int[arrSize];
            for (int j = 0; j < arrSize; ++j) {
                arr[j] = new Random().nextInt();
            }
            lines.add(String.valueOf(arrSize));
            lines.add(Arrays.stream(arr).mapToObj(String::valueOf).collect(Collectors.joining(" ")));
        }
        try {
            Files.write(outputPath, lines);
        } catch (IOException ignored) {
            ignored.printStackTrace();
        }
    }
}

Locally tested it with 1_000 arrays while each array has 500 numbers, reading all the elements cost about: 340ms using Scanner.nextInt() while OP's method about 1.5ms . 1_000数组进行本地测试,而每个数组有500数字,使用Scanner.nextInt()读取所有元素的成本约为: 340ms而OP的方法约为1.5ms

NextInt: LongSummaryStatistics{count=20, sum=6793762162, min=315793916, average=339688108.100000, max=618922475}
Split: LongSummaryStatistics{count=20, sum=26073528, min=740860, average=1303676.400000, max=5724370}

So I really have doubt the issue lies in the input reading. 因此,我真的怀疑问题出在输入阅读中。

Since in your case you are aware of the total count of elements all that you have to do is to read X integers from the second line. 因为在您的情况下,您知道元素的总数,所以您要做的就是从第二行读取X个整数。 Here is an example: 这是一个例子:

public static void main(String[] args) {
        Scanner in = new Scanner(System.in);

        int count = in.nextInt();
        int array[] = new int[count];

        for (int i = 0; i < count; i++) {
            array[i] = in.nextInt();
        }
}

If this is not fast enough, which I doubt, then you could switch to the use of a BufferedReader as follows: 如果这不够快,我对此表示怀疑,那么您可以按以下方式切换到BufferedReader的使用:

public static void main(String[] args) throws IOException {
        BufferedReader in = new BufferedReader(new InputStreamReader(System.in));

        int count = Integer.parseInt(in.readLine());
        int array[] = new int[count];

        for (int i = 0; i < count; i++) {
            int nextInteger = 0;
            int nextChar = in.read();
            do {
                nextInteger = nextInteger * 10 + (nextChar - '0');
                nextChar = in.read();
            } while (nextChar != -1 && nextChar != (int)' ');
            array[i] = nextInteger;
        }
}

In your case the input will be aways valid so this means that each of the integers will be separated by a single whitespace and the input will end up with EoF character. 在您的情况下,输入将无效,因此这意味着每个整数将由单个空格分隔,并且输入将以EoF字符结尾。

If both are still slow enough for you then you could keep looking for more articles about Reading Integers in Java, Competative programming like this one: https://www.geeksforgeeks.org/fast-io-in-java-in-competitive-programming/ 如果两者都还不够慢,那么您可以继续寻找更多有关Java读取整数的文章像这样的竞争性编程https : //www.geeksforgeeks.org/fast-io-in-java-in-competitive-编程/

Still my favorite language when it comes to competitions will always be C :) Good luck and enjoy! 比赛时,我最喜欢的语言仍然总是C :)祝您好运并享受!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM