简体   繁体   English

Java-无法完成写入文本文件

[英]Java - Cannot complete writing text file

I need to process a large text file (600 MB approximately) in order to format it correctly, writing the formatted output to a new text file. 我需要处理一个较大的文本文件(大约600 MB)以便正确格式化,然后将格式化的输出写入新的文本文件。 The problem is that writing the content into the new file stops at about 6.2 MB. 问题在于将内容写入新文件的时间大约为6.2 MB。 Here is the code: 这是代码:

/* Analysis of the text in fileName to see if the lines are in the correct format 
     * (Theme\tDate\tTitle\tDescription). If there are lines that are in the incorrect format,
     * the method corrects them.
     */
    public static void cleanTextFile(String fileName, String destFile) throws IOException {
        OutputStreamWriter writer = null;
        BufferedReader reader = null;

        try {
            writer = new OutputStreamWriter(new FileOutputStream(destFile), "UTF8");
        } catch (IOException e) {
            System.out.println("Could not open or create the file " + destFile);
        }

        try {
            reader = new BufferedReader(new FileReader(fileName));
        } catch (FileNotFoundException e) {
            System.out.println("The file " + fileName + " doesn't exist in the folder.");
        }

        String line;
        String[] splitLine;
        StringBuilder stringBuilder = new StringBuilder("");

        while ((line = reader.readLine()) != null) {
            splitLine = line.split("\t");
            stringBuilder.append(line);

            /* If the String array resulting of the split operation doesn't have size 4,
             * then it means that there are elements of the news item missing in the line
             */
            while (splitLine.length != 4) {
                line = reader.readLine();
                stringBuilder.append(line);

                splitLine = stringBuilder.toString().split("\t");
            }
            stringBuilder.append("\n");
            writer.write(stringBuilder.toString());
            stringBuilder = new StringBuilder("");

            writer.flush();
        }

        writer.close();
        reader.close();

    }

I've already looked for answers, but the problem is usually related to the fact that the writer is not being closed or the absence of the flush() method. 我已经在寻找答案,但是问题通常与以下事实有关:编写器未关闭或没有flush()方法。 Therefore, I'm thinking that the problem is in the BufferedReader. 因此,我认为问题出在BufferedReader中。 What am I missing? 我想念什么?

Look at this loop: 看一下这个循环:

while (splitLine.length != 4) {
    line = reader.readLine();
    stringBuilder.append(line);

    splitLine = stringBuilder.toString().split("\t");
}

If you ever end up with more than 5 items in splitLine , you'll just keep reading data forever... you won't even notice when you've reached the end of the file, as you'll just keep appending null to the StringBuilder . 如果您在splitLine最终获得了5个以上的项目,那么您将永远读取数据……您甚至不会注意到何时到达文件末尾,因为您将继续向其附加null StringBuilder I don't know whether this is what's happening (we don't know what your data looks like) but it's certainly feasible, and you should guard against it. 我不知道这是怎么回事(我们不知道您的数据是什么样的),但这当然是可行的,您应该注意这一点。

(You should also use a try / finally block for closing resources, but that's a separate matter.) (您还应该使用try / finally块来关闭资源,但这是另一回事。)

Separate out the FileOutputStream as it's own variable and close it, too: 分离出FileOutputStream作为它自己的变量,然后将其关闭:

FileOutputStream fos = new FileOutputStream(destFile);
writer = new OutputStreamWriter(fos);

   ...

writer.flush();
fos.flush();
  1. The try/catch isn't well coded, in case of errors the process continue. try / catch的编码不正确,以防万一出错,过程继续进行。
  2. You may replace 您可以更换

      stringBuilder = new StringBuilder(""); 

    by 通过

      stringBuilder.setLength( 0 ); 
  3. Use your own parser line.indexOf('\\t',from) in place of String.split() 使用您自己的解析器line.indexOf('\\t',from)代替String.split()

  4. Add the parts obtained with line.substring( b, e ) to a List< String > 将用line.substring(b,e)获得的部分添加到List <String>
  5. Use a PrintStream with correct character set, use the constructor with two parameters 使用带有正确字符集的PrintStream,使用带有两个参数的构造函数
  6. Write the information 4 by 4, consuming data from the list, when list.size() >= 4 当list.size()> = 4时,使用4乘4编写信息,消耗列表中的数据

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM