[英]delete am image from a PDF file using PDFbox
I am attempting to delete images from a PDF using java and PDFbox.我正在尝试使用 java 和 PDFbox 从 PDF 中删除图像。 The images are not inline, and the PDF does not have patterns or forms.图像不是内嵌的,PDF 没有图案或表格。 The pdf file contains 2 images. pdf 文件包含 2 张图像。 The PDFdebugger tool shows Resources >> XObject >> IM3 and IM5. PDFdebugger 工具显示资源 >> XObject >> IM3 和 IM5。 The problem is: I display the output pdf file and the images are not deleted.问题是:我显示输出的pdf文件并且图像没有被删除。
public class DeleteImage {
public static void removeImages(String pdfFile) throws Exception {
PDDocument document = PDDocument.load(new File(pdfFile));
for (PDPage page : document.getPages()) {
PDResources pdResources = page.getResources();
pdResources.getXObjectNames().forEach(propertyName -> {
if(!pdResources.isImageXObject(propertyName)) {
return;
}
PDXObject o;
try {
o = pdResources.getXObject(propertyName);
if (o instanceof PDImageXObject) {
System.out.println("propertyName" + propertyName);
page.getCOSObject().removeItem(propertyName);
}
} catch (IOException e) {
e.printStackTrace();
}
});
for (COSName name : page.getResources().getPatternNames()) {
PDAbstractPattern pattern = page.getResources().getPattern(name);
System.out.println("have pattern");
}
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
List<Object> tokens = parser.getTokens();
System.out.println("original tokens size" + tokens.size());
List<Object> newTokens = new ArrayList<Object>();
for(int j=0; j<tokens.size(); j++) {
Object token = tokens.get( j );
if( token instanceof Operator ) {
Operator op = (Operator)token;
System.out.println("operation" + op.getName());
//find image - remove it
if( op.getName().equals("Do") ) {
System.out.println("op equals Do");
newTokens.remove(newTokens.size()-1);
continue;
} else if ("BI".equals(op.getName())) {
System.out.println("inline -- op equals BI");
} else {
System.out.println("op not quals Do");
}
}
newTokens.add(token);
}
PDDocument newDoc = new PDDocument();
PDPage newPage = newDoc.importPage(page);
newPage.setResources(page.getResources());
System.out.println("tokens size" + newTokens.size());
PDStream newContents = new PDStream(newDoc);
OutputStream out = newContents.createOutputStream();
ContentStreamWriter writer = new ContentStreamWriter( out );
writer.writeTokens( newTokens);
out.close();
newPage.setContents( newContents );
}
document.save("RemoveImage.pdf");
document.close();
}
public static void remove(String pdfFile) throws Exception {
PDDocument document = PDDocument.load(new File(pdfFile));
PDResources resources = null;
for (PDPage page : document.getPages()) {
resources = page.getResources();
for (COSName name : resources.getXObjectNames()) {
PDXObject xobject = resources.getXObject(name);
if (xobject instanceof PDImageXObject) {
System.out.println("have image");
removeImages(pdfFile);
}
}
}
document.save("RemoveImage.pdf");
document.close();
}
}
remove
...如果你打电话remove
... In remove
you在remove
你
document
,将 PDF 加载到document
,document
, and for each page迭代document
页面,并为每一页
removeImages
which loads the same original file, processes it, and saves the result as "RemoveImage.pdf".调用removeImages
加载相同的原始文件,对其进行处理,并将结果保存为“RemoveImage.pdf”。document
to "RemoveImage.pdf".在所有这些处理之后,您将未更改的document
保存到“RemoveImage.pdf”。 So in that last step you overwrite any changes you may have done in removeImages
and end up with your original file in "RemoveImage.pdf"!因此,在最后一步中,您会覆盖您在removeImages
所做的任何更改,并以“RemoveImage.pdf”中的原始文件结束!
removeImages
Directly...如果您直接调用removeImages
... In removeImages
you do some changes but there are certain issues:在removeImages
您做了一些更改,但存在某些问题:
Whenever you find an image Xobject resource, you attempt to remove it from the page directly每当你找到一个图像 Xobject 资源时,你试图直接从页面中删除它
page.getCOSObject().removeItem(propertyName);
but the image Xobject resource is not a direct child of the page
, it is managed by pdResources
, so you should remove it from there.但是图像 Xobject 资源不是page
的直接子pdResources
,它由pdResources
管理,因此您应该从那里删除它。
You remove all Do instructions from the page content, not only those for image Xobjects, so you probably remove more than you wanted.您从页面内容中删除了所有Do指令,而不仅仅是图像 Xobjects 的那些指令,因此您可能删除的比您想要的更多。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.