简体   繁体   English

JAVA POI无法写入大字文件

[英]JAVA POI failed to write a large word file

I am using POI to delete "enter" in a .doc file (Blank line). 我正在使用POI删除.doc文件中的“输入”(空白行)。

My code below works correctly when the input file is not large (for example, less than 1MB). 当输入文件不大(例如,小于1MB)时,我的以下代码可以正常工作。 However, when I deal with large input.doc that is 4mb, the output.doc is not correctly generated. 但是,当我处理4mb的大input.doc时,不会正确生成output.doc。 I can not open the file. 我无法打开文件。

Does anyone have better idea to write the big file correctly? 有谁有更好的主意正确地写入大文件? Or, is there any other java code that can delete "enter" in a big .doc file? 或者,是否还有其他Java代码可以删除大型.doc文件中的“输入”? Thank you very much. 非常感谢你。

package mydoc;

import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.hwpf.*;
import org.apache.poi.hwpf.usermodel.*;
import java.io.*;

public class test {
/*The ASCII of "Enter" is 13*/
private static final short ENTER_ASCII = 13;

public static void main(String[] args){
    /* the location of the input file   */
    String fileName = "D:\\input.doc";

    deleteEnter(fileName);
}

public static void deleteEnter(String fileName){

    POIFSFileSystem fs = null;
    try{
        fs = new POIFSFileSystem(new FileInputStream(fileName));
        HWPFDocument doc = new HWPFDocument(fs);            

        Range range = doc.getRange();

        for (int i = 0; i < range.numParagraphs(); i++) 
        {
            if (range.getParagraph(i).text().toCharArray()[0]==ENTER_ASCII)
            {  
                range.getParagraph(i).delete();
            } 
        }                                           

        FileOutputStream fos = null;
        fos = new FileOutputStream(new File("D:\\output.doc"));

        doc.write(fos);
        fos.flush();
        fos.close();

    }//end try
    catch (Exception e){
        e.printStackTrace();
    }//end catch
}                                       

} }

"enter" is the line separator right ? “输入”是行分隔符对吗? It's platform dependant so I propose the above solution : 它取决于平台,所以我提出了上述解决方案:

String separator = System.getProperty("line.separator")
file = new File(filename);
FileInputStream fis=new FileInputStream(file.getAbsolutePath());
HWPFDocument document=new HWPFDocument(fis);
extractor = new WordExtractor(document);
String [] fileData = extractor.getParagraphText();
for(int i=0;i<fileData.length;i++){
    if(fileData[i] != null)
        fileData[i] = fileData[i].replace(separator,"");
}

And then you just have to output fileData in a clean doc file. 然后,您只需要在一个干净的doc文件中输出fileData即可。

Depending on your needs you could even use a macro; 根据需要,您甚至可以使用宏。 You should even be able to use regex like this: "^13{2,}", but that didn't work for me in Word 2010, see http://social.msdn.microsoft.com/Forums/en-US/0d921f97-b59a-48a9-a01a-20fe72f21c19/how-to-remove-blank-lines-?forum=worddev 您甚至应该可以使用这样的正则表达式:“ ^ 13 {2,}”,但在Word 2010中对我而言不起作用,请参阅http://social.msdn.microsoft.com/Forums/en-US / 0d921f97-b59a-48a9-a01a-20fe72f21c19 /如何删除空白行-?forum = worddev

Sub RemoveBlankLines()
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = "^p^p"
        .Replacement.Text = "^p"
        .MatchWildcards = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

Sub RemoveEnters()
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        '^11 or ^l  New line
        .Text = "^l"
        .Replacement.Text = ""
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
        With Selection.Find
        '^13 or ^p  Carriage return/paragraph mark
        .Text = "^p"
        .Replacement.Text = ""
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
End Sub

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM