[英]Apache PDFBOX - getting java.lang.OutOfMemoryError when using split(PDDocument document)
[英]java.lang.OutOfMemoryError when validating pdf with pdfbox preflight 2.0.13
不确定是否有人遇到过这个问题,但在验证 pdf 时遇到内存不足异常。 在这里发布以提高知名度,如果有人可以提供帮助,那就太棒了。
如果有人有任何想法,请分享。 在这一点上,我真的无法继续前进。
我试过的东西
遵循维基中的建议但没有成功PDFBox 常见问题解答
最大堆大小从 2GB 增加到 4GB
删除了 jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
尝试使用 jdk 1.7
我的环境
Java信息
版本
java版本“1.8.0_131”
Java(TM) SE 运行时环境(构建 1.8.0_131-b11)
Java HotSpot(TM) 64 位服务器 VM(构建 25.131-b11,混合模式)
使用的 JVM 参数
java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
示例pdf
控制台输出
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
示例代码
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;
public class Validator {
private File file = null;
private List<ValidationError> errorList = new ArrayList<ValidationError>();
public Validator(File file) {
this.file = file;
}
public List<ValidationError> getErrors(){
return errorList;
}
public boolean validate() throws Exception{
PreflightParser parser = null;
PreflightDocument document = null;
ValidationResult result = null;
try {
parser = new PreflightParser(file);
parser.parse();
document = parser.getPreflightDocument();
document.validate();
result = document.getResult();
errorList = result.getErrorsList();
}
catch(Exception e) {
throw e;
}
finally {
if(document != null) {
try {
document.close();
}catch(Exception ignored) {}
}
parser = null;
document = null;
result = null;
}
return errorList.size() > 0 ? true : false;
}
}
当我添加这些选项时:
-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g
又失败了。 我使用 VisualVM 来分析转储堆文件。 我发现了一些有趣的东西。
//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
{
COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
if (groupDictionary != null)
{
String sVal = groupDictionary.getNameAsString(COSName.S);
if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
{
context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
"Group has a transparency S entry or the S entry is null"));
}
}
}
它创建了一个 ValidationError 对象,但构造函数是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
}
this.cause = cause;
t = new Exception();
}
您可以看到,一旦出现错误,它就会创建 ValidationError 并创建一个 StringBuilder。
所以,你有三种方法来解决这个问题:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
String key = errorCode + details;
if (commonDetailMap.containsKey(key)) {
this.details = commonDetailMap.get(key);
} else {
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
commonDetailMap.put(key, this.details);
}
}
this.cause = cause;
t = new Exception();
}
我认为使用 Map 来避免创建太可能 StringBuilder 会起作用。 但是如果错误代码和详细信息是多值的,则 Map 会太大。
因此,另一种更改源代码的方法是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
// invoke intern
this.details = sb.toString().intern();
}
this.cause = cause;
t = new Exception();
}
实习生()是:
Returns a canonical representation for the string object.
我认为使用 intern() 更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.