简体   繁体   English

如何通过Java读取巨大文件并写入新文件

[英]how to read from a huge file and write to a new file by java

What I am doing is to read one file line by line, format every line, then write to a new file. 我正在做的是逐行读取一个文件,格式化每一行,然后写入新文件。 But the problem is that the file is huge, nearly 178 MB. 但是问题在于文件很大,将近178 MB。 But always getting error message: IO console updater error, java heap space. 但是总是收到错误消息:IO控制台更新程序错误,java堆空间。 Here is my code: 这是我的代码:

public class fileFormat {
    public static void main(String[] args) throws IOException{

        String strLine;

        FileInputStream fstream = new FileInputStream("train_final.txt");
        BufferedReader reader = new BufferedReader(new InputStreamReader(fstream));
        BufferedWriter writer = new BufferedWriter(new FileWriter("newOUTPUT.txt"));

        while((strLine = reader.readLine()) != null){
            List<String> numberBox = new ArrayList<String>();
            StringTokenizer st = new StringTokenizer(strLine);
            while(st.hasMoreTokens()){
                numberBox.add(st.nextToken());
            }
            for (int i=1; i< numberBox.size(); i++){
                String head = numberBox.get(0);
                String tail = numberBox.get(i);
                String line = head + "  "+tail ;
                System.out.println(line);
                writer.write(line);
                writer.newLine();
            }
            numberBox.clear();
        }
        reader.close();
        writer.close();
    }
}

How can I avoid this error message? 如何避免出现此错误消息? Moreover, I have set the VM preference: -xms1024m 此外,我还设置了VM首选项:-xms1024m

Remove the line 删除线

System.out.println(line);

This is a workaround the fialing console updater, which otherwise runs out of memory. 这是解决控制台更新程序的问题,否则更新程序将耗尽内存。

The program looks okay. 该程序看起来还可以。 I suspect the problem is that you run this inside of Eclipse, and System.out is collected by Eclipse in memory (to be displayed in that Console window). 我怀疑问题是您在Eclipse内部运行了该程序,并且Eclipse将System.out收集在内存中(将显示在该“控制台”窗口中)。

 System.out.println(line);

Try to run it outside of Eclipse, change Eclipse settings to pipe System.out somewhere, or remove the line. 尝试在Eclipse之外运行它,更改Eclipse设置以将System.out用管道传输到某个地方,或删除该行。

This part of the code: 这部分代码:

       for (int i=1; i< numberBox.size(); i++){
            String head = numberBox.get(0);
            String tail = numberBox.get(i);
            String line = head + "  "+tail ;
            System.out.println(line);
            writer.write(line);
            writer.newLine();
       }

Can be translated to: 可以翻译为:

       String head = numberBox.get(0);
       for (int i=1; i< numberBox.size(); i++){
            String tail = numberBox.get(i);
            System.out.print(head);
            System.out.print(" ");
            System.out.println(tail);
            writer.write(head);
            writer.write(" ");
            writer.write(tail);
            writer.newLine();
       }

This may add a little code duplication but it avoids creating a lot of objects. 这可能会增加一些代码重复,但是避免创建很多对象。

Also there if you merge this for loop with the loop contructing the numberBox, you won't need numberBox structure at all. 同样,如果您将此for循环与构造numberBox的循环合并,则根本不需要numberBox结构。

If you read whole file the heap memory will occupy so better option in to read the file in chuck. 如果您读取整个文件,堆内存将占据更好的选择,以读取chuck中的文件。 See my below code. 见我下面的代码。 It will start reading from the offset given in argument and will return the end offset . 它将从参数中给出的偏移量开始读取,并返回结束偏移量。 You need to pass number of lines to be read. 您需要传递要读取的行数。

Please remember: You can use any collection to store these read lines and clear the collection before calling this method to read next chunk. 请记住:您可以使用任何集合来存储这些读取行并清除该集合,然后调用此方法以读取下一个块。

FileInputStream fis = new FileInputStream(file);
InputStreamReader   streamReader = new InputStreamReader(fis, "UTF-8");
LineNumberReader   reader = new LineNumberReader(streamReader);

//call this below method recursively until the file does not reaches to the end //递归调用以下方法,直到文件未到达末尾

public int getParsedLines(LineNumberReader reader, int iLineNumber_Start, int iNumberOfLinesToBeRead) {
    int iLineNumber_End = 0;

    int iReadUptoLines = iLineNumber_Start + iNumberOfLinesToBeRead;

    try {
        reader.mark(iLineNumber_Start);
        reader.setLineNumber(iLineNumber_Start);
        do {
            String str = reader.readLine();
            if (str == null) {
                break;
            }
            // your code


            iLineNumber_End = reader.getLineNumber();
        } while (iLineNumber_End != iReadUptoLines);
    } catch (Exception ex) {
        // exception handling
    }
    return iLineNumber_End;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM