Java遍历数组-优化

Question

I've got some Java code that runs quite the expected way, but it's taking some amount of time -some seconds- even if the job is just looping through an array. 我有一些Java代码可以按预期的方式运行，但是即使工作只是遍历数组，也要花一些时间（几秒钟）。

The input file is a Fasta file as shown in the image below. 输入文件是Fasta文件，如下图所示。 The file I'm using is 2.9Mo, and there are some other Fasta file that can take up to 20Mo. 我正在使用的文件是2.9Mo，还有一些其他Fasta文件可能会占用20Mo。

在此处输入图片说明

And in the code im trying to loop through it by bunches of threes, eg: AGC TTT TCA ... etc The code has no functional sens for now but what I want is to append each Amino Acid to it's equivalent bunch of Bases. 在代码中，im试图通过三连串循环遍历它，例如：AGC TTT TCA ...等该代码目前没有功能，但是我想要的是将每个氨基酸附加到它的等价碱基上。 Example : 范例：

AGC - Ser / CUG Leu / ... etc AGC-Ser / CUG Leu / ...等

So what's wrong with the code ? 那么代码有什么问题呢？ and Is there any way to do it better ? 还有什么办法可以做得更好？ Any optimization ? 任何优化？ Looping through the whole String is taking some time, maybe just seconds, but need to find a better way to do it. 遍历整个String会花费一些时间，可能只是几秒钟，但是需要找到一种更好的方法来完成。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class fasta {
    public static void main(String[] args) throws IOException {

        File fastaFile;
        FileReader fastaReader;
        BufferedReader fastaBuffer = null;
        StringBuilder fastaString = new StringBuilder();

        try {
            fastaFile = new File("res/NC_017108.fna");
            fastaReader = new FileReader(fastaFile);
            fastaBuffer = new BufferedReader(fastaReader);
            String fastaDescription = fastaBuffer.readLine();
            String line = fastaBuffer.readLine();

            while (line != null) {
                fastaString.append(line);
                line = fastaBuffer.readLine();
            }

            System.out.println(fastaDescription);
            System.out.println();
            String currentFastaAcid;

            for (int i = 0; i < fastaString.length(); i+=3) {
                currentFastaAcid = fastaString.toString().substring(i, i + 3);
                System.out.println(currentFastaAcid);
            }

        } catch (NullPointerException e) {
            System.out.println(e.getMessage());
        } catch (FileNotFoundException e) {
            System.out.println(e.getMessage());
        } catch (IOException e) {
            System.out.println(e.getMessage());
        } finally {
            fastaBuffer.close();
        }

    }

}

Answer 1

currentFastaAcid = fastaString.toString().substring(i, i + 3);

Please replace with 请替换为

currentFastaAcid = fastaString.substring(i, i + 3);

toString method of StringBuilder create new instance of String object every time you call it. 每次调用StringBuilder的toString方法时，都会创建String对象的新实例。 It still contain a copy of all your large string. 它仍然包含所有大字符串的副本。 If you call substring directly from StringBuilder it will return a small copy of substring. 如果直接从StringBuilder调用子字符串，它将返回子字符串的一个小副本。 Also remove System.out.println if you don't really need it. 如果确实不需要，也请删除System.out.println。

Answer 2

The big factor here is you are doing the call to substring over a new String each time. 这里最大的因素是您每次都在新的String上调用子字符串。

Instead, use substring directly over the stringbuilder 而是直接在stringbuilder上使用子字符串

for (int i = 0; i < fastaString.length(); i+=3){
    currentFastaAcid = fastaString.substring(i, i + 3);
    System.out.println(currentFastaAcid);
}

Also, instead of print the currentFastaAcid each time, save it into a list and print this list at the end 另外，不要每次都打印currentFastaAcid，而是将其保存到列表中并在末尾打印此列表。

List<String> acids = new LinkedList<String>();

for (int i = 0; i < fastaString.length(); i+=3){
    currentFastaAcid = fastaString.substring(i, i + 3);
    acids.add(currentFastaAcid);
}

System.out.println(acids.toString());

Answer 3

Your main problem besides the debug output surely is, that you are creating a new String with your completely read data from the file in each iteration of your loop: 除了调试输出，您的主要问题肯定是，您正在创建一个新的String，并且在循环的每次迭代中都从文件中完全读取了数据：

currentFastaAcid = fastaString.toString().substring(i, i + 3);

fastaString.toString() will give the same result in each iteration and therefore is redundant. fastaString.toString（）在每次迭代中将给出相同的结果，因此是多余的。 Get it outside the loop and you will surely save some seconds runtime. 将其置于循环之外，您肯定会节省几秒钟的运行时间。

Answer 4

Apart from suggested optimization in the serial code, I will go for parallel processing to reduce time further. 除了建议的串行代码优化之外，我还将进行并行处理以进一步减少时间。 If you have really big file, you can divide the work of reading file and processing read-lines, in separate threads. 如果文件很大，则可以将读取文件和处理读取行的工作分在不同的线程中。 That way, when one thread is busy reading nextline from large file, other thread can process read-lines and print them on console. 这样，当一个线程正忙于从大文件读取下一行时，另一线程可以处理读取行并将其打印在控制台上。

Answer 5

If you remove the 如果删除

System.out.println(currentFastaAcid);

line in the for loop, you will gain quite decent time. 在for循环中，您将获得相当不错的时间。

Java遍历数组-优化

问题描述

5 个解决方案

解决方案1
2 2013-10-26 13:57:32

解决方案2
1 已采纳 2013-10-26 13:43:39

解决方案3
1 2013-10-26 13:55:07

解决方案4
1 2013-10-26 14:05:57

解决方案5
0 2013-10-26 13:41:26

Java遍历数组-优化

问题描述

5 个解决方案

解决方案1 2 2013-10-26 13:57:32

解决方案2 1 已采纳 2013-10-26 13:43:39

解决方案3 1 2013-10-26 13:55:07

解决方案4 1 2013-10-26 14:05:57

解决方案5 0 2013-10-26 13:41:26

解决方案1
2 2013-10-26 13:57:32

解决方案2
1 已采纳 2013-10-26 13:43:39

解决方案3
1 2013-10-26 13:55:07

解决方案4
1 2013-10-26 14:05:57

解决方案5
0 2013-10-26 13:41:26