[英]How to know the Image or Picture Location while parsing MS Word Doc in java using apache poi
HWPFDocument wordDoc = new HWPFDocument(new FileInputStream(fileName));
List<Picture> picturesList = wordDoc.getPicturesTable().getAllPictures();
上面的語句列出了文檔中所有圖片的列表。 我想知道圖像在文檔中的哪個文本/位置之后?
您以錯誤的方式查看圖片,這就是為什么您找不到任何位置的原因!
您需要做的是依次處理文檔的每個CharacterRun 。 將其傳遞給PicturesTable ,然后檢查字符運行中是否有圖片。如果有,則從表中取回該圖片,並且在運行時就知道它在文檔中的位置
最簡單的說是:
PicturesSource pictures = new PicturesSource(document);
PicturesTable pictureTable = document.getPicturesTable();
Range r = document.getRange();
for(int i=0; i<r.numParagraphs(); i++) {
Paragraph p = r.getParagraph(i);
for(int j=0; j<p.numCharacterRuns(); j++) {
CharacterRun cr = p.getCharacterRun(j);
if (pictureTable.hasPicture(cr)) {
Picture picture = pictures.getFor(cr);
// Do something useful with the picture
}
}
}
您可以在由Apache POI支持的Microsoft Word .doc的Apache Tika解析器中找到一個很好的例子。
您應該添加PicturesSourceClass
公共類PicturesSource {
private PicturesTable picturesTable;
private Set<Picture> output = new HashSet<Picture>();
private Map<Integer, Picture> lookup;
private List<Picture> nonU1based;
private List<Picture> all;
private int pn = 0;
public PicturesSource(HWPFDocument doc) {
picturesTable = doc.getPicturesTable();
all = picturesTable.getAllPictures();
lookup = new HashMap<Integer, Picture>();
for (Picture p : all) {
lookup.put(p.getStartOffset(), p);
}
nonU1based = new ArrayList<Picture>();
nonU1based.addAll(all);
Range r = doc.getRange();
for (int i = 0; i < r.numCharacterRuns(); i++) {
CharacterRun cr = r.getCharacterRun(i);
if (picturesTable.hasPicture(cr)) {
Picture p = getFor(cr);
int at = nonU1based.indexOf(p);
nonU1based.set(at, null);
}
}
}
private boolean hasPicture(CharacterRun cr) {
return picturesTable.hasPicture(cr);
}
private void recordOutput(Picture picture) {
output.add(picture);
}
private boolean hasOutput(Picture picture) {
return output.contains(picture);
}
private int pictureNumber(Picture picture) {
return all.indexOf(picture) + 1;
}
public Picture getFor(CharacterRun cr) {
return lookup.get(cr.getPicOffset());
}
private Picture nextUnclaimed() {
Picture p = null;
while (pn < nonU1based.size()) {
p = nonU1based.get(pn);
pn++;
if (p != null) return p;
}
return null;
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.