[英]JAVA POI failed to write a large word file
My code below works correctly when the input file is not large (for example, less than 1MB). 当输入文件不大(例如,小于1MB)时,我的以下代码可以正常工作。 However, when I deal with large input.doc that is 4mb, the output.doc is not correctly generated.
但是,当我处理4mb的大input.doc时,不会正确生成output.doc。 I can not open the file.
我无法打开文件。
Does anyone have better idea to write the big file correctly? 有谁有更好的主意正确地写入大文件? Or, is there any other java code that can delete "enter" in a big .doc file?
或者,是否还有其他Java代码可以删除大型.doc文件中的“输入”? Thank you very much.
非常感谢你。
package mydoc;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.usermodel.*;
import java.io.*;
public class test {
/*The ASCII of "Enter" is 13*/
private static final short ENTER_ASCII = 13;
public static void main(String[] args){
/* the location of the input file */
String fileName = "D:\\input.doc";
deleteEnter(fileName);
}
public static void deleteEnter(String fileName){
POIFSFileSystem fs = null;
try{
fs = new POIFSFileSystem(new FileInputStream(fileName));
HWPFDocument doc = new HWPFDocument(fs);
Range range = doc.getRange();
for (int i = 0; i < range.numParagraphs(); i++)
{
if (range.getParagraph(i).text().toCharArray()[0]==ENTER_ASCII)
{
range.getParagraph(i).delete();
}
}
FileOutputStream fos = null;
fos = new FileOutputStream(new File("D:\\output.doc"));
doc.write(fos);
fos.flush();
fos.close();
}//end try
catch (Exception e){
e.printStackTrace();
}//end catch
}
} }
"enter" is the line separator right ? “输入”是行分隔符对吗? It's platform dependant so I propose the above solution :
它取决于平台,所以我提出了上述解决方案:
String separator = System.getProperty("line.separator")
file = new File(filename);
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
for(int i=0;i<fileData.length;i++){
if(fileData[i] != null)
fileData[i] = fileData[i].replace(separator,"");
}
And then you just have to output fileData in a clean doc file. 然后,您只需要在一个干净的doc文件中输出fileData即可。
Depending on your needs you could even use a macro; 根据需要,您甚至可以使用宏。 You should even be able to use regex like this: "^13{2,}", but that didn't work for me in Word 2010, see http://social.msdn.microsoft.com/Forums/en-US/0d921f97-b59a-48a9-a01a-20fe72f21c19/how-to-remove-blank-lines-?forum=worddev
您甚至应该可以使用这样的正则表达式:“ ^ 13 {2,}”,但在Word 2010中对我而言不起作用,请参阅http://social.msdn.microsoft.com/Forums/en-US / 0d921f97-b59a-48a9-a01a-20fe72f21c19 /如何删除空白行-?forum = worddev
Sub RemoveBlankLines()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
.Text = "^p^p"
.Replacement.Text = "^p"
.MatchWildcards = False
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
Sub RemoveEnters()
Selection.Find.ClearFormatting
Selection.Find.Replacement.ClearFormatting
With Selection.Find
'^11 or ^l New line
.Text = "^l"
.Replacement.Text = ""
End With
Selection.Find.Execute Replace:=wdReplaceAll
With Selection.Find
'^13 or ^p Carriage return/paragraph mark
.Text = "^p"
.Replacement.Text = ""
End With
Selection.Find.Execute Replace:=wdReplaceAll
End Sub
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.