簡體   English   中英

如何使用apache poi在Java中解析MS Word Doc時知道圖像或圖片的位置

[英]How to know the Image or Picture Location while parsing MS Word Doc in java using apache poi

HWPFDocument wordDoc = new HWPFDocument(new FileInputStream(fileName));
List<Picture> picturesList = wordDoc.getPicturesTable().getAllPictures();

上面的語句列出了文檔中所有圖片的列表。 我想知道圖像在文檔中的哪個文本/位置之后?

您以錯誤的方式查看圖片,這就是為什么您找不到任何位置的原因!

您需要做的是依次處理文檔的每個CharacterRun 將其傳遞給PicturesTable ,然后檢查字符運行中是否有圖片。如果有,則從表中取回該圖片,並且在運行時就知道它在文檔中的位置

最簡單的說是:

PicturesSource pictures = new PicturesSource(document);
PicturesTable pictureTable = document.getPicturesTable();

Range r = document.getRange();
for(int i=0; i<r.numParagraphs(); i++) {
    Paragraph p = r.getParagraph(i);
    for(int j=0; j<p.numCharacterRuns(); j++) {
      CharacterRun cr = p.getCharacterRun(j);
      if (pictureTable.hasPicture(cr)) {
         Picture picture = pictures.getFor(cr);
         // Do something useful with the picture
      }
    }
}

您可以在由Apache POI支持的Microsoft Word .docApache Tika解析器中找到一個很好的例子。

您應該添加PicturesSourceClass

公共類PicturesSource {

private PicturesTable picturesTable;
private Set<Picture> output = new HashSet<Picture>();
private Map<Integer, Picture> lookup;
private List<Picture> nonU1based;
private List<Picture> all;
private int pn = 0;

public PicturesSource(HWPFDocument doc) {
    picturesTable = doc.getPicturesTable();
    all = picturesTable.getAllPictures();


    lookup = new HashMap<Integer, Picture>();
    for (Picture p : all) {
        lookup.put(p.getStartOffset(), p);
    }


    nonU1based = new ArrayList<Picture>();
    nonU1based.addAll(all);
    Range r = doc.getRange();
    for (int i = 0; i < r.numCharacterRuns(); i++) {
        CharacterRun cr = r.getCharacterRun(i);
        if (picturesTable.hasPicture(cr)) {
            Picture p = getFor(cr);
            int at = nonU1based.indexOf(p);
            nonU1based.set(at, null);
        }
    }
}


private boolean hasPicture(CharacterRun cr) {
    return picturesTable.hasPicture(cr);
}

private void recordOutput(Picture picture) {
    output.add(picture);
}

private boolean hasOutput(Picture picture) {
    return output.contains(picture);
}

private int pictureNumber(Picture picture) {
    return all.indexOf(picture) + 1;
}

public Picture getFor(CharacterRun cr) {
    return lookup.get(cr.getPicOffset());
}


private Picture nextUnclaimed() {
    Picture p = null;
    while (pn < nonU1based.size()) {
        p = nonU1based.get(pn);
        pn++;
        if (p != null) return p;
    }
    return null;
}

}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM