[英]java.lang.OutOfMemoryError when validating pdf with pdfbox preflight 2.0.13
PDFBOX-4450 Details on Issue PDFBOX-4450 问题详情
Not sure if anyone has encountered this issue, but am getting an outofmemory exception when validating pdf's.不确定是否有人遇到过这个问题,但在验证 pdf 时遇到内存不足异常。 Posting here for visibility, if anyone could help that would be awesome.在这里发布以提高知名度,如果有人可以提供帮助,那就太棒了。
If anyone has any ideas, please share.如果有人有任何想法,请分享。 At this point I can't really move forward.在这一点上,我真的无法继续前进。
Stuff I've tried我试过的东西
Followed suggestions in wiki without success PDFBox faq遵循维基中的建议但没有成功PDFBox 常见问题解答
Increased max heap size from 2GB to 4GB最大堆大小从 2GB 增加到 4GB
Removed jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider删除了 jvm arg:-Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Tried using jdk 1.7尝试使用 jdk 1.7
My Environment我的环境
Java info Java信息
java -version版本
java version "1.8.0_131" java版本“1.8.0_131”
Java(TM) SE Runtime Environment (build 1.8.0_131-b11) Java(TM) SE 运行时环境(构建 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode) Java HotSpot(TM) 64 位服务器 VM(构建 25.131-b11,混合模式)
JVM Args used使用的 JVM 参数
java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider java -Xmx2048m -Dsun.java2d.cmm=sun.java2d.cmm.kcms.KcmsServiceProvider
Example pdf示例pdf
Pdf from PDFBOX-4450 PDFBOX-4450 中的 PDF
Console Output控制台输出
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font Symbol
Jan 30, 2019 10:25:58 AM org.apache.pdfbox.pdmodel.font.PDType1Font <init>
WARNING: Using fallback font ArialMT for base font ZapfDingbats
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3664)
at java.lang.String.<init>(String.java:207)
at java.lang.StringBuilder.toString(StringBuilder.java:407)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1587)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.getDictionaryString(COSDictionary.java:1559)
at org.apache.pdfbox.cos.COSDictionary.toString(COSDictionary.java:1531)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.checkGroup(XObjFormValidator.java:138)
at org.apache.pdfbox.preflight.xobject.XObjFormValidator.validate(XObjFormValidator.java:73)
at org.apache.pdfbox.preflight.process.reflect.GraphicObjectPageValidationProcess.validate(GraphicObjectPageValidationProcess.java:74)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
at org.apache.pdfbox.preflight.utils.ContextHelper.validateElement(ContextHelper.java:57)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validateXObjects(ResourcesValidationProcess.java:224)
at org.apache.pdfbox.preflight.process.reflect.ResourcesValidationProcess.validate(ResourcesValidationProcess.java:81)
at org.apache.pdfbox.preflight.utils.ContextHelper.callValidation(ContextHelper.java:84)
Sample code示例代码
import java.io.File;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.preflight.PreflightDocument;
import org.apache.pdfbox.preflight.ValidationResult;
import org.apache.pdfbox.preflight.ValidationResult.ValidationError;
import org.apache.pdfbox.preflight.parser.PreflightParser;
public class Validator {
private File file = null;
private List<ValidationError> errorList = new ArrayList<ValidationError>();
public Validator(File file) {
this.file = file;
}
public List<ValidationError> getErrors(){
return errorList;
}
public boolean validate() throws Exception{
PreflightParser parser = null;
PreflightDocument document = null;
ValidationResult result = null;
try {
parser = new PreflightParser(file);
parser.parse();
document = parser.getPreflightDocument();
document.validate();
result = document.getResult();
errorList = result.getErrorsList();
}
catch(Exception e) {
throw e;
}
finally {
if(document != null) {
try {
document.close();
}catch(Exception ignored) {}
}
parser = null;
document = null;
result = null;
}
return errorList.size() > 0 ? true : false;
}
}
When I add these options:当我添加这些选项时:
-XX:+HeapDumpOnOutOfMemoryError -Xmx3550m -Xms3550m -Xmn2g
It failed again.又失败了。 And I use VisualVM to analysis the dump heap file.我使用 VisualVM 来分析转储堆文件。 I found something interesting.我发现了一些有趣的东西。
And most of char[]'s content is:而大部分 char[] 的内容是:
//org.apache.pdfbox.preflight.process.reflect.SinglePageValidationProcess#validateGroupTransparency
protected void validateGroupTransparency(PreflightContext context, PDPage page) throws ValidationException
{
COSBase baseGroup = page.getCOSObject().getItem(XOBJECT_DICTIONARY_KEY_GROUP);
COSDictionary groupDictionary = COSUtils.getAsDictionary(baseGroup, context.getDocument().getDocument());
if (groupDictionary != null)
{
String sVal = groupDictionary.getNameAsString(COSName.S);
if (XOBJECT_DICTIONARY_VALUE_S_TRANSPARENCY.equals(sVal))
{
context.addValidationError(new ValidationError(ERROR_GRAPHIC_TRANSPARENCY_GROUP,
"Group has a transparency S entry or the S entry is null"));
}
}
}
It create a ValidationError object, but the constructor is:它创建了一个 ValidationError 对象,但构造函数是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
}
this.cause = cause;
t = new Exception();
}
You can see that, once there is a error, it create the ValidationError and create a StringBuilder.您可以看到,一旦出现错误,它就会创建 ValidationError 并创建一个 StringBuilder。
So, you have three ways to solve the problem:所以,你有三种方法来解决这个问题:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
String key = errorCode + details;
if (commonDetailMap.containsKey(key)) {
this.details = commonDetailMap.get(key);
} else {
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
this.details = sb.toString();
commonDetailMap.put(key, this.details);
}
}
this.cause = cause;
t = new Exception();
}
I think using a Map to avoid creating too may StringBuilder would work.我认为使用 Map 来避免创建太可能 StringBuilder 会起作用。 But the Map would be too large if the error code and details are multivalued.但是如果错误代码和详细信息是多值的,则 Map 会太大。
So, the another way to change the source code is:因此,另一种更改源代码的方法是:
public ValidationError(String errorCode, String details, Throwable cause)
{
this(errorCode);
if (details != null)
{
StringBuilder sb = new StringBuilder(this.details.length() + details.length() + 2);
sb.append(this.details).append(", ").append(details);
// invoke intern
this.details = sb.toString().intern();
}
this.cause = cause;
t = new Exception();
}
The intern() is:实习生()是:
Returns a canonical representation for the string object.
I think that using intern() is better.我认为使用 intern() 更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.