[英]Java - Cannot complete writing text file
I need to process a large text file (600 MB approximately) in order to format it correctly, writing the formatted output to a new text file. 我需要处理一个较大的文本文件(大约600 MB)以便正确格式化,然后将格式化的输出写入新的文本文件。 The problem is that writing the content into the new file stops at about 6.2 MB.
问题在于将内容写入新文件的时间大约为6.2 MB。 Here is the code:
这是代码:
/* Analysis of the text in fileName to see if the lines are in the correct format
* (Theme\tDate\tTitle\tDescription). If there are lines that are in the incorrect format,
* the method corrects them.
*/
public static void cleanTextFile(String fileName, String destFile) throws IOException {
OutputStreamWriter writer = null;
BufferedReader reader = null;
try {
writer = new OutputStreamWriter(new FileOutputStream(destFile), "UTF8");
} catch (IOException e) {
System.out.println("Could not open or create the file " + destFile);
}
try {
reader = new BufferedReader(new FileReader(fileName));
} catch (FileNotFoundException e) {
System.out.println("The file " + fileName + " doesn't exist in the folder.");
}
String line;
String[] splitLine;
StringBuilder stringBuilder = new StringBuilder("");
while ((line = reader.readLine()) != null) {
splitLine = line.split("\t");
stringBuilder.append(line);
/* If the String array resulting of the split operation doesn't have size 4,
* then it means that there are elements of the news item missing in the line
*/
while (splitLine.length != 4) {
line = reader.readLine();
stringBuilder.append(line);
splitLine = stringBuilder.toString().split("\t");
}
stringBuilder.append("\n");
writer.write(stringBuilder.toString());
stringBuilder = new StringBuilder("");
writer.flush();
}
writer.close();
reader.close();
}
I've already looked for answers, but the problem is usually related to the fact that the writer is not being closed or the absence of the flush()
method. 我已经在寻找答案,但是问题通常与以下事实有关:编写器未关闭或没有
flush()
方法。 Therefore, I'm thinking that the problem is in the BufferedReader. 因此,我认为问题出在BufferedReader中。 What am I missing?
我想念什么?
Look at this loop: 看一下这个循环:
while (splitLine.length != 4) {
line = reader.readLine();
stringBuilder.append(line);
splitLine = stringBuilder.toString().split("\t");
}
If you ever end up with more than 5 items in splitLine
, you'll just keep reading data forever... you won't even notice when you've reached the end of the file, as you'll just keep appending null
to the StringBuilder
. 如果您在
splitLine
最终获得了5个以上的项目,那么您将永远读取数据……您甚至不会注意到何时到达文件末尾,因为您将继续向其附加null
StringBuilder
。 I don't know whether this is what's happening (we don't know what your data looks like) but it's certainly feasible, and you should guard against it. 我不知道这是怎么回事(我们不知道您的数据是什么样的),但这当然是可行的,您应该注意这一点。
(You should also use a try
/ finally
block for closing resources, but that's a separate matter.) (您还应该使用
try
/ finally
块来关闭资源,但这是另一回事。)
Separate out the FileOutputStream as it's own variable and close it, too: 分离出FileOutputStream作为它自己的变量,然后将其关闭:
FileOutputStream fos = new FileOutputStream(destFile);
writer = new OutputStreamWriter(fos);
...
writer.flush();
fos.flush();
You may replace 您可以更换
stringBuilder = new StringBuilder("");
by 通过
stringBuilder.setLength( 0 );
Use your own parser line.indexOf('\\t',from)
in place of String.split()
使用您自己的解析器
line.indexOf('\\t',from)
代替String.split()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.